참고 자료

BLOOM : BigScience Large Open-science Open-access Multilingual Language Model / 가장 큰 multilingual model 소개

joannekim0420 2022. 11. 1. 16:21
728x90

https://huggingface.co/bigscience/bloom

 

모델 소개가 huggingface 공식 문서에 너무 잘 되어 있다..

 

bigscience/bloom · Hugging Face

Powered by  AzureML This model can be loaded loaded on AzureML Managed Endpoint

huggingface.co

training data에 사용된 언어 distribution

간략 정리

  • BigScience 에서공개한 다국어(총 59개 언어, 46 natural lanauge + 13 프로그래밍 언어) LLM generation 모델인 BLOOM
  • 투명하게 훈련 과정을 공개하였고, 176 billion 파라미터를 갖는다. 
  • BLOOM 처음으로 100B 파라미터를 넘는 language model로 416 A100 80GB GPU 클러스터에서 몇 달 동안 훈련.

Model Type: Transformer-based Language Model

 

 

Model Architecture and Objective

  • Modified from Megatron-LM GPT2 (see paperBLOOM Megatron code):
  • Decoder-only architecture
  • Layer normalization applied to word embeddings layer (StableEmbedding; see codepaper)
  • ALiBI positional encodings (see paper), with GeLU activation functions
  • 176,247,271,424 parameters:
    • 3,596,615,680 embedding parameters
    • 70 layers, 112 attention heads
    • Hidden layers are 14336-dimensional
    • Sequence length of 2048 tokens used (see BLOOM tokenizertokenizer description)

Objective Function: Cross Entropy with mean reduction (see API documentation).

 

 

공식 소개 블로그 : https://bigscience.huggingface.co/blog/bloom

 

BLOOM

Our 176B parameter language model is here.

bigscience.huggingface.co