collection of resources to understand attention and implementation of attention models
- https://towardsdatascience.com/the-fall-of-rnn-lstm-2d1594c74ce0
- RNN shortcoming's in remembering -> translation models more effective when reversed
- memory bandwidth limitations - observed in waveRNN
- alternatives
- 2d causal con nets for seq-seq: https://arxiv.org/abs/1808.03867
- expands on seq2seq inspiration for ettts: https://arxiv.org/abs/1705.03122
- transformer, attention is all you need: https://arxiv.org/abs/1706.03762
- https://towardsdatascience.com/memory-attention-sequences-37456d271992
- read
- diagram attention model
- review of attention: https://blog.heuritech.com/2016/01/20/attention-mechanism/
- read
- diagram attention model
- workshop demonstrating other memory and attention architectures: http://www.thespermwhale.com/jaseweston/ram/
- vgg architecture
- another review of attention: https://distill.pub/2016/augmented-rnns/
- read
- NTM
- Attention
- adaptive computation time
- nueral programmer
- reinforcement learning
- discussion of downside of attention as well as applications: http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/
- n^2 attentions values
- RL for attention model: http://arxiv.org/abs/1406.6247
- image captioning: http://arxiv.org/abs/1502.03044
- parse trees: http://arxiv.org/abs/1412.7449
- comprehension: http://arxiv.org/abs/1506.03340
- idea of attention as fuzzy memory - similar to NTM
- end to end memory: http://arxiv.org/abs/1503.08895
- NTM: https://github.com/dennybritz/deeplearning-papernotes/blob/master/neural-turing-machines.md
- RLNTM: http://arxiv.org/abs/1505.00521
- illustration: https://jalammar.github.io/illustrated-transformer/
- annotation: http://nlp.seas.harvard.edu/2018/04/03/attention.html
- alternative architectures
- some combination of recurrence and attention
- fast weights: https://arxiv.org/abs/1610.06258
- https://medium.com/@sanyamagarwal/understanding-attentive-recurrent-comparators-ea1b741da5c3
- https://towardsdatascience.com/memory-attention-sequences-37456d271992
- heirarchichal attention
- similar to temporal convolution
- similar to wavenet
- TCN vs RNN: https://arxiv.org/abs/1803.01271
- TCN outperforms RNN(LSTM) on variety of tasks
- nueral turing machine
- unsupervised attention model: https://blog.openai.com/language-unsupervised/
- attention for medical image segmentation
- simple implementation of techniques
- https://github.com/e-lab/pytorch-demos/tree/master/seq-learning-basic
- reimplement in notebook
- CNN
- RNN
- ATT
- https://github.com/e-lab/pytorch-demos/tree/master/seq-learning-char
- reimplement in notebook
- CNN
- RNN
- ATT
- positional encodings
- None
- Absolute
- sinusoidal
- figure out why swapping axes of data and convolution makes training faster but worse
- fast & bad: nb x 1 x nh x L -> conv2d -> nb * nh * 1 * L
- slow & good: nb x 1 x L x nh -> conv2d -> nb * nh * L * 1
- test out limiting attention window
- test out source implementation for attention to see if attention model is doing anythin
- submit pull request to repo to correct model
- see if attention works for generating different length sequences than training sequence length
- inspect attention matrices
- mask attention to make it causal
- figure out how to pad to receptive field to speed generation
- https://github.com/e-lab/pytorch-demos/tree/master/seq-learning-basic
- RNN shortcoming's in remembering -> translation models more effective when reversed