posted on 2022-07-28, 17:53authored byElias Jacob de Menezes-Neto, Marco Bruno Miranda Clementino
The text of a court ruling is split into chunks containing 512 tokens each. They are passed to a Portuguese BERT model, from which we collect the embeddings from the CLS token. We feed an LSTM with the embeddings from the previous step, condensing them into one vector. We pass this vector to a classifier head to get a final classification.