Bert(memo) 3 minute read BERT(Pre-training of deep bidirectional transformers for language understanding) Introduction 기존 GPT model: Transformer decoder를 이용하여 autoregressive 한 lang...