LIPT: Improving Prompt Tuning with Late Inception Reparameterization
<p>Overview of LIPT framework. Yellow represents trainable (tunable) modules, while blue represents frozen (non-trainable) modules. Left: A prompt generator with three bottleneck branches. The initialized prompt passes through three bottleneck branches of varying sizes and is then added back to the initial prompt (this connection pattern is similar to the Inception architecture). Right: The architecture of the transformer-based base model, along with the forward propagation and backpropagation paths. During forward propagation, the prompt generator on the left is inserted into the specified “Prompt Layer” of the model on the right in parallel. The generated prompt is concatenated with the output of the Prompt Layer and passed to the next layer. During backpropagation, only the prompt generator network is fine-tuned.</p> "> Figure 2
<p>Evaluate a one-sentence task and a two-sentence task. (<b>Left</b>): Comparison of five types of initialization methods. (<b>Right</b>): Add a self-connecting effect.</p> "> Figure 3
<p>LIPT performance on different tasks. (<b>Left</b>): Comparison of bottleneck numbers. (<b>Right</b>): Different module sizes.</p> "> Figure 4
<p>Performance trends of two single-sentence tasks and two two-sentence tasks at different insertion layers. The RoBERTa-large backbone model is used, selecting even-numbered layers between the 10th and 24th layers. The shaded area shows the mean and standard deviation of 3 different random runs.</p> ">
Abstract
:1. Introduction
- Introduction of LIPT, a novel soft prompt generation method based on an Inception structure.
- Comprehensive test of LIPT with RoBERTa-large and GPT-2 backbones, showing improved performance on ten standard text classification tasks in both full-data and few-shot scenarios.
- Introduction of an Efficiency Indicator metric for comprehensive performance evaluation of PEFT methods.
2. Related Work
2.1. Parameter-Efficient Fine-Tuning
2.2. Prompt Tuning
2.3. Reparameterization-Based Methods
3. Method
3.1. Problem Formulation
3.2. Late Inception Prompt Tuning
3.3. Design Choices
3.4. Efficiency Indicator
3.4.1. Metric Selection
3.4.2. Accuracy Normalization
3.4.3. Training Cost Indicator Composition
- Training Speed Normalization: Since higher training speed implies lower cost, we apply inverse normalization to training speed:
- Parameter Size Logarithmic Normalization: To handle the large scale of parameter values, we use logarithmic normalization with base 10:
3.4.4. Nonlinear Adjustment Using the Sigmoid Function
- k controls the slope of the curve, with larger values yielding sharper transitions around .
- represents the midpoint of the distribution, allowing for cost adjustments to be centered around the median or mean value.
3.4.5. Calculation of the Final Efficiency Indicator
3.4.6. Summary
4. Experiments
4.1. Datasets
4.2. Experiment Settings
4.3. Baselines
5. Results and Discussion
5.1. Full-Data Results
5.2. Few-Shot Results
5.3. Training Cost
6. Analysis
6.1. Soft Prompt Initialization
6.2. Exploration of LIPT Structure
6.2.1. Xception-like Approach
6.2.2. Bottleneck Sizes
6.3. Layer Insertion Index
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Implementation Details
Hyperparameter | RoBERTa | GPT2 | ||
---|---|---|---|---|
Full-Data | Few-Shot | Full-Data | Few-Shot | |
#Layers | 24 | 24 | 136 | 36 |
Hidden size | 1024 | 1024 | 1280 | 1280 |
Dropout rate | 0.1 | 0.1 | 0.1 | 0.1 |
Peak learning rate | 1.00 × 10−3 | 1.00 × 10−3 | 1.00 × 10−3 | 1.00 × 10−3 |
Warmup rate | 0.06 | 0.06 | 0.06 | 0.06 |
Batch size | {16,32} | {8,16,32} | {8,16} | {4,8,16} |
Weight decay | 0.1 | 0.1 | 0.1 | 0.1 |
Training step | \ | 500 | \ | 500 |
Training epoch | 10 | \ | 10 | \ |
num_prompt_tokens | 10 | 10 | 10 | 10 |
proj_down_size | 64,128,256 | 64,128,256 | 64,128,256 | 64,128,256 |
References
- Hittawe, M.M.; Harrou, F.; Togou, M.A.; Sun, Y.; Knio, O. Time-series weather prediction in the Red sea using ensemble transformers. Appl. Soft Comput. 2024, 164, 111926. [Google Scholar] [CrossRef]
- Harrou, F.; Zeroual, A.; Hittawe, M.M.; Sun, Y. Chapter 6—Recurrent and convolutional neural networks for traffic management. In Road Traffic Modeling and Management; Harrou, F., Zeroual, A., Hittawe, M.M., Sun, Y., Eds.; Elsevier: Amsterdam, The Netherlands, 2022; pp. 197–246. [Google Scholar]
- Floridi, L.; Chiriatti, M. GPT-3: Its nature, scope, limits, and consequences. Minds Mach. 2020, 30, 681–694. [Google Scholar] [CrossRef]
- Fedus, W.; Zoph, B.; Shazeer, N. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. J. Mach. Learn. Res. 2022, 23, 1–39. [Google Scholar]
- Lialin, V.; Deshpande, V.; Rumshisky, A. Scaling down to scale up: A guide to parameter-efficient fine-tuning. arXiv 2023, arXiv:2303.15647. [Google Scholar]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
- Fu, Z.; Yang, H.; So, A.M.-C.; Lam, W.; Bing, L.; Collier, N. On the effectiveness of parameter-efficient fine-tuning. Proc. AAAI Conf. Artif. Intell. 2023, 37, 12799–12807. [Google Scholar] [CrossRef]
- Ding, N.; Qin, Y.; Yang, G.; Wei, F.; Yang, Z.; Su, Y.; Hu, S.; Chen, Y.; Chan, C.-M.; Chen, W. Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models. arXiv 2022, arXiv:2203.06904. [Google Scholar]
- Brian, L.; Rami, A.; Noah, C. The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 3045–3059. [Google Scholar]
- Li, X.L.; Liang, P. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; pp. 4582–4597. [Google Scholar]
- Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; De Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; Gelly, S. Parameter-efficient transfer learning for NLP. In Proceedings of the International Conference on Machine Learning, Beach, CA, USA, 9–15 June 2019; pp. 2790–2799. [Google Scholar]
- Rckl, A.; Geigle, G.; Glockner, M.; Beck, T.; Pfeiffer, J.; Reimers, N.; Gurevych, I. AdapterDrop: On the Efficiency of Adapters in Transformers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, 7–11 November 2021; pp. 7930–7946. [Google Scholar]
- Karimi Mahabadi, R.; Henderson, J.; Ruder, S. Compacter: Efficient Low-Rank Hypercomplex Adapter Layers. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–14 December 2021; pp. 1022–1035. [Google Scholar]
- He, S.; Ding, L.; Dong, D.; Zhang, J.; Tao, D. SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 2184–2190. [Google Scholar]
- Zaken, E.B.; Ravfogel, S.; Goldberg, Y. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv 2021, arXiv:2106.10199. [Google Scholar]
- Liu, X.; Sun, T.; Huang, X.; Qiu, X. Late Prompt Tuning: A Late Prompt Could Be Better Than Many Prompts. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 1325–1338. [Google Scholar]
- Zhu, W.; Tan, M. SPT: Learning to Selectively Insert Prompts for Better Prompt Tuning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; pp. 11862–11878. [Google Scholar]
- Razdaibiedina, A.; Mao, Y.; Khabsa, M.; Lewis, M.; Hou, R.; Ba, J.; Almahairi, A. Residual Prompt Tuning: Improving prompt tuning with residual reparameterization. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023; pp. 6740–6757. [Google Scholar]
- Xiao, Y.; Xu, L.; Li, J.; Lu, W.; Li, X. Decomposed Prompt Tuning via Low-Rank Reparameterization. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, 6–10 December 2023; pp. 13335–13347. [Google Scholar]
- Wang, A.; Singh, A.; Michael, J.; Hill, F.; Levy, O.; Bowman, S. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, 1 November 2018; pp. 353–355. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Lagler, K.; Schindelegger, M.; Böhm, J.; Krásná, H.; Nilsson, T. GPT2: Empirical slant delay model for radio space geodetic techniques. Geophys. Res. Lett. 2013, 40, 1069–1073. [Google Scholar] [CrossRef] [PubMed]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
- Aghajanyan, A.; Zettlemoyer, L.; Gupta, S. Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; pp. 7319–7328. [Google Scholar]
- Zhang, A.; Tay, Y.; Zhang, S.; Chan, A.; Luu, A.T.; Hui, S.C.; Fu, J. Beyond fully-connected layers with quaternions: Parameterization of hypercomplex multiplications with 1/n parameters. arXiv 2021, arXiv:2102.08597. [Google Scholar]
- Peters, M.E.; Ruder, S.; Smith, N.A. To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), Florence, Italy, 2 August 2019; pp. 7–14. [Google Scholar]
- Vu, T.; Lester, B.; Constant, N.; Al-Rfou, R.; Cer, D. SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 5039–5059. [Google Scholar]
- Asai, A.; Salehi, M.; Peters, M.; Hajishirzi, H. ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 6655–6672. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. arXiv 2021, arXiv:2106.09685. [Google Scholar]
- Gu, Y.; Han, X.; Liu, Z.; Huang, M. Ppt: Pre-trained prompt tuning for few-shot learning. arXiv 2021, arXiv:2109.04332. [Google Scholar]
- Edalati, A.; Tahaei, M.; Kobyzev, I.; Nia, V.P.; Clark, J.J.; Rezagholizadeh, M. Krona: Parameter efficient tuning with kronecker adapter. arXiv 2022, arXiv:2212.10650. [Google Scholar]
- Dettmers, T.; Pagnoni, A.; Holtzman, A.; Zettlemoyer, L. QLoRA: Efficient Finetuning of Quantized LLMs. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; pp. 10088–10115. [Google Scholar]
- Lv, K.; Yang, Y.; Liu, T.; Gao, Q.; Guo, Q.; Qiu, X. Full parameter fine-tuning for large language models with limited resources. arXiv 2023, arXiv:2306.09782. [Google Scholar]
- Nawrot, P.; Chorowski, J.; Łańcucki, A.; Ponti, E.M. Efficient transformers with dynamic token pooling. arXiv 2022, arXiv:2211.09761. [Google Scholar]
- Wiebe, J.; Wilson, T.; Cardie, C. Annotating expressions of opinions and emotions in language. Lang. Resour. Eval. 2005, 39, 165–210. [Google Scholar] [CrossRef]
- Pang, B.; Lee, L. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), Ann Arbor, MI, USA, 25–30 June 2005; pp. 115–124. [Google Scholar]
- Pang, B.; Lee, L. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), Barcelona, Spain, 21–26 July 2004; pp. 271–278. [Google Scholar]
- Voorhees, E.M.; Tice, D.M. Building a question answering test collection. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, 24–28 July 2000; pp. 200–207. [Google Scholar]
- Liu, X.; Ji, K.; Fu, Y.; Tam, W.; Du, Z.; Yang, Z.; Tang, J. P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Dublin, Ireland, 22–27 May 2022; pp. 61–68. [Google Scholar]
- Wu, Z.; Wang, S.; Gu, J.; Hou, R.; Dong, Y.; Vydiswaran, V.G.V.; Ma, H. IDPG: An Instance-Dependent Prompt Generation Method. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–15 July 2022; pp. 5507–5521. [Google Scholar]
Method | Tuable Parameters | Training Speed | Memory Cost | Backprop | EI |
---|---|---|---|---|---|
Token/ms (↑) | GB (↓) | ||||
Model Tuning | 355 M | 11.6 | 23.5 | no | 1.06 |
Adapter | 1.6 M | 15.5 (1.3×) | 16.5 (29.8%) | no | 1.62 |
AdapterDrop | 811 K | 21.6 (1.9×) | 9.5 (59.6%) | no | 2.07 |
Prompt Tuning | 21 K | 16.9 (1.5×) | 17.8 (24.3%) | no | 0.51 |
P-Tuning V2 | 985 K | 19.2 (1.7×) | 16.8 (28.5%) | no | 1.29 |
S-IDPG-PHM | 114 K | 12.0 (1.0×) | 16.8 (28.5%) | no | 0.49 |
BitFit | 273 K | 16.5 (1.4×) | 15.7 (33.2%) | no | 2.03 |
LoRA | 788 K | 16.4 (1.4×) | 16.2 (31.1%) | no | 1.88 |
LPT | 792 K | 23.2 (2.0×) | 15.5 (34.0%) | yes | 1.54 |
LIPT | 931 K | 25.7 (2.2×) | 14.1 (40.0%) | yes | 2.15 |
Category | Datasets | |Train| | |Dev| | |Test| | |Y| | Type | Labels |
---|---|---|---|---|---|---|---|
Single-sentence | SST-2 | 67,349 | 872 | 1821 | 2 | sentiment | positive, negative |
MPQA | 7606 | 1000 | 2000 | 2 | opinion polarity | positive, negative | |
MR | 7662 | 1000 | 2000 | 2 | sentiment | positive, negative | |
Subj | 7000 | 1000 | 2000 | 2 | subjectivity | subjective, objective | |
Trec | 4952 | 500 | 500 | 6 | question cls. | abbr., entity, description, human, loc., num. | |
Sentence pair | MNLI | 392,702 | 19,647 | 19,643 | 3 | NLI | entailment, neutral, contradiction |
MRPC | 3668 | 408 | 1725 | 2 | paraphrase | equivalent, not equivalent | |
QNLI | 104,743 | 5463 | 5463 | 2 | NLI | entailment, not entailment | |
QQP | 363,846 | 40,430 | 390,965 | 2 | paraphrase | equivalent, not equivalent | |
RTE | 2490 | 277 | 3000 | 2 | NLI | entailment, not entailment |
Method | Tunable Parameters | SST-2 | MPQA | MR | Subj | TREC | MNLI | MRPC | QNLI | QQP | RTE | Avg |
---|---|---|---|---|---|---|---|---|---|---|---|---|
(acc) | (acc) | (acc) | (acc) | (acc) | (acc) | (acc and F1) | (acc) | (acc and F1) | (acc) | |||
Model Tuning ‡ | 355 M | 95.6 | 90.2 | 91.3 | 96.8 | 97.6 | 89.3 | 91.2 | 94.6 | 90.7 | 86.2 | 92.4 |
Adapter ‡ | 1.6 M | 96.2 (0.2) | 89.2 (0.5) | 91.6 (0.4) | 96.8 (0.4) | 97.0 (0.3) | 90.5 (0.1) | 90.3 (1.0) | 94.7 (0.3) | 89.4 (0.7) | 85.5 (1.2) | 92.3 |
AdapterDrop ‡ | 811 K | 95.3 (0.3) | 89.1 (0.7) | 91.0 (0.5) | 95.3 (0.6) | 95.7 (0.5) | 88.5 (0.2) | 90.1 (1.3) | 93.3 (0.3) | 88.3 (0.3) | 81.1 (2.0) | 90.8 |
BitFit ‡ | 273 K | 95.9 (0.1) | 89.2 (0.9) | 91.8 (0.5) | 96.9 (0.1) | 96.2 (0.3) | 90.0 (0.1) | 89.6 (0.9) | 94.4 (0.2) | 87.9 (0.4) | 82.4 (1.1) | 91.4 |
LoRA ‡ | 788 K | 96.2 (0.3) | 90.1 (0.3) | 92.0 (0.1) | 97.1 (0.4) | 96.8 (0.6) | 89.8 (0.3) | 91.1 (0.6) | 94.8 (0.2) | 89.8 (0.1) | 84.8 (2.1) | 92.3 |
Prompt Tuning ‡ | 21 K | 94.9 (0.5) | 88.8 (0.8) | 89.6 (0.5) | 93.9 (0.6) | 86.4 (0.7) | 86.7 (0.9) | 75.7 (0.7) | 91.4 (0.1) | 81.2 (0.8) | 60.8 (0.5) | 84.9 |
P-tuning v2 ‡ | 985 K | 95.8 (0.4) | 89.9 (0.6) | 91.4 (0.4) | 96.5 (0.2) | 95.8 (0.6) | 88.2 (0.2) | 86.5 (2.1) | 93.7 (0.3) | 85.3 (0.2) | 66.9 (2.3) | 89.0 |
S-IDPG-PHM ‡ | 114 K | 94.8 (0.3) | 89.5 (0.6) | 90.8 (0.5) | 95.9 (0.6) | 89.3 (0.4) | 87.4 (0.5) | 77.3 (1.2) | 91.2 (0.4) | 82.3 (1.9) | 62.7 (1.9) | 86.1 |
LPT | 792 K | 95.26 (0.3) | 90.90 (0.2) | 91.07 (0.6) | 96.93 (0.4) | 89.53 (1.9) | 86.01 (0.2) | 84.34 (1) | 90.99 (0.4) | 83.5 (0.1) | 77.86 (0.6) | 88.6 |
LIPT | 931 K | 95.07 (0.3) | 91.03 (0.1) | 91 (0.5) | 97.73 (0.4) | 94 (0.9) | 86.88 (0.2) | 87.68 (1.3) | 91.06 (0.2) | 85.25 (0.1) | 79.78 (1.3) | 90.0 |
Method | Tunable Parameters | Subj (acc) | TREC (acc) | MRPC (acc and F1) | RTE (acc) | Avg |
---|---|---|---|---|---|---|
Model Tuning | 774 M | 97.2 | 97 | 88 | 75.8 | 89.5 |
Prompt Tuning | 26 K | 88.8 (1.0) | 82.7 (1.1) | 75.1 (0.5) | 53.7 (1.3) | 75.1 |
LPT | 990 K | 96.67 (0.1) | 91.67 (0.3) | 80.86 (0.6) | 68.59 (0.6) | 84.45 |
LIPT | 1.2 M | 96.83 (0.2) | 90.73 (0.2) | 80.63 (1.0) | 68.71 (1.8) | 84.23 |
Method | Tunable Parameters | SST-2 | MPQA | MR | Subj | TREC | MNLI | MRPC | QNLI | QQP | RTE | Avg |
---|---|---|---|---|---|---|---|---|---|---|---|---|
(acc) | (acc) | (acc) | (acc) | (acc) | (acc) | (acc and F1) | (acc) | (acc and F1) | (acc) | |||
Model Tuning | 355 M | 91.4 (0.8) | 87.2 (1.1) | 89.4 (0.6) | 95.1 (0.4) | 95.4 (0.5) | 75.3 (2.1) | 85.1 (1.8) | 85.2 (0.9) | 77.3 (1.2) | 67.0 (7.7) | 84.8 |
Prompt Tuning | 21 K | 91.1 (1.5) | 74.7 (5.1) | 88.3 (0.6) | 86.4 (0.4) | 81.7 (2.4) | 45.5 (1.5) | 74.6 (0.3) | 58.1 (1.6) | 52.6 (5.8) | 61.2 (1.7) | 71.4 |
P-tuning v2 | 985 K | 91.3 (0.3) | 85.1 (1.6) | 88.0 (1.5) | 94.5 (0.4) | 94.6 (0.8) | 61.6 (2.7) | 76.6 (1.8) | 73.7 (2.4) | 71.7 (1.8) | 56.0 (1.1) | 79.3 |
S-IDPG-PHM | 114 K | 91.3 (0.5) | 75.9 (3.8) | 88.7 (0.4) | 87.2 (0.6) | 84.7 (2.1) | 46.3 (1.1) | 75.1 (0.8) | 59.4 (0.7) | 56.4 (3.0) | 64.7 (1.7) | 73 |
LPT | 792 K | 91.7 (0.8) | 89.94 (1.2) | 89.42 (0.6) | 93.65 (1.1) | 83.61 (2.6) | 69.44 (4.3) | 79.16 (2.0) | 79.32 (1.9) | 76.25 (1.0) | 72.28 (1.5) | 82.48 |
LIPT | 931 K | 91.18 (0.2) | 89.16 (0.5) | 88.96 (1.1) | 94.21 (1.1) | 85.69 (1.1) | 71.76 (1.7) | 78.79 (1.3) | 76.44 (1.1) | 74.9 (1.2) | 74.32 (2.2) | 82.54 |
RoBERTa-large | |||||
Method | Subj | TREC | MRPC | RTE | Avg |
LXPT | 97.5 (0.6) | 93.6 (0.5) | 86.91 (1.2) | 78.34 (0.4) | 89.1 |
LIPT | 97.73 (0.4) | 94 (0.9) | 87.68 (1.3) | 79.78 (1.3) | 89.8 |
GPT2-large | |||||
Method | Subj | TREC | MRPC | RTE | Avg |
LXPT | 96.8 (0.1) | 89.13 (0.9) | 80.29 (1.2) | 67.63 (3.8) | 83.46 |
LIPT | 96.83 (0.2) | 90.73 (0.2) | 80.63 (1.0) | 68.71 (1.8) | 84.23 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
He, Y.; Feng, A.; Gao, Z.; Song, X. LIPT: Improving Prompt Tuning with Late Inception Reparameterization. Electronics 2024, 13, 4741. https://doi.org/10.3390/electronics13234741
He Y, Feng A, Gao Z, Song X. LIPT: Improving Prompt Tuning with Late Inception Reparameterization. Electronics. 2024; 13(23):4741. https://doi.org/10.3390/electronics13234741
Chicago/Turabian StyleHe, Yawen, Ao Feng, Zhengjie Gao, and Xinyu Song. 2024. "LIPT: Improving Prompt Tuning with Late Inception Reparameterization" Electronics 13, no. 23: 4741. https://doi.org/10.3390/electronics13234741
APA StyleHe, Y., Feng, A., Gao, Z., & Song, X. (2024). LIPT: Improving Prompt Tuning with Late Inception Reparameterization. Electronics, 13(23), 4741. https://doi.org/10.3390/electronics13234741