Digital Sustainability Group

NLP Seminar Autumn 2024

Time: every Friday from 10:15h until 12:00h
Location: Room 105 at the University Bern main building, Hochschulstrasse 4, Bern
ILIAS Course: https://ilias.unibe.ch/goto_ilias3_unibe_crs_3102245.html
KSL Entry: https://ksl.unibe.ch/KSL/kurzansicht?28&stammNr=471397&semester=HS2024&lfdNr=0

Responsible for the seminar: PD Dr. Matthias Stürmer
Main lecturer of the seminar: Luca Rolshoven

This seminar offers a conceptual and practical introduction to modern-day Natural Language Processing (NLP). The covered NLP techniques include the latest advancements such as transformer architectures, BERT, Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG), alongside foundational methods like Bag-of-Words (BoW), TF-IDF, and word2vec. Some lectures are featured by guest speakers from academia or industry, giving additional perspectives to the students.

Before each session, students have to study the reading material and prepare questions for discussion. This engagement will deepen the understanding and foster analytical skills. Additionally, participants will undertake a project in the realm of NLP, which they will present at the end of the seminar.

This seminar is mandatory for all students conducting a bachelor's or master's thesis at the Research Center for Digital Sustainability.

Upon successful completion of this course, you will …

  • know the most important methods of NLP
  • be able to understand papers in the field of NLP
  • have planned, executed and presented a project in the field of NLP
  • have developed your presentation skills
  • be able to understand and critically comment on the presentations of your fellow students

 

Schedule 2024

Date

Topic

Reading Material

Guest Speaker

20 September 2024

Introduction to NLP

None

 

27 September 2024

Text Preprocessing and Language Basics

NLP Course - Tokenizers (Hugging Face NLP Course)
Tokenizers Quicktour (Hugging Face Tokenizers Library)

Veton Matoshi, Researcher at Bern University of Applied Sciences

4 October 2024

Classical Machine Learning for NLP

Analyzing Documents with TF-IDF (Tutorial)
Sentiment Analysis Using Naive Bayes (Course Notes)

 

11 October 2024

Word Embeddings and Vector Space Models

Chapter 8: Distributional Semantics and Word Embeddings, Text Analytics for Corpus Linguistics and Digital Humanities: Simple R Scripts and Tools (access using university login)

Additional reading material (optional):
Efficient Estimation of Word Representations in Vector Space (word2vec Paper)
Global Vectors for Word Representation (GloVe Paper)
Deep Contextualized Word Representations (ELMo Paper)

Prof. Dr. Gerold Schneider, Professor of Computational Linguistics and Head of LiRI NLP group at University of Zürich

18 October 2024

Transformer Architecture

Attention is All You Need (Transformer Paper)
The Illustrated Transformer (Blog Post)

 

25 October 2024

Introduction to Student Project

None

 

1 November 2024

Pre-Training

LLaMA: Open and Efficient Foundation Language Models

Additional reading material (optional):
Improving Language Understanding by Generative Pre-Training (GPT Paper)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Paper)

Leandro von Werra, Chief Loss Officer at Hugging Face

8 November 2024

LLM Inference

A Guide to Quantization in LLMs (Blog Post)
Fast Inference from Transformers via Speculative Decoding (Paper)

Additional reading material (optional):
Flash Attention 3 (Blog Post)
vLLM (Blog Post)
Text Generation Inference (TGI; Hugging Face Library)

 

15 November 2024

Fine-Tuning

FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning (paper)
Understanding Parameter-Efficient Finetuning of Large Language Models: From Prefix Tuning to LLaMA-Adapters (Blog Post)
Understanding Mixed Precision Training (Blog Post)

Additional Links:
SFT Trainer (Huggingface TRL Library)

Dr. Joel Niklaus, Research Scientist at Harvey

 

22 November 2024

Alignment

Hugging Face Blog Post about RLHF
Direct Preference Optimization (paper)
Everything you need to know about Fine-tuning and Merging LLMs: Maxime Labonne (video)

Lewis Tunstall, LLM Engineering & Research at Hugging Face

29 November 2024

NLP in Industry

None

Flurin Gishamer, Senior Data Scientist at Open Systems

6 December 2024

Large Language Models and Applications

A Comprehensive Overview of Large Language Models (paper)

Prof. Dr. Marcel Gygli, Professor for AI in the Public Sector at Bern University of Applied Sciences

13 December 2024

Emerging Trends & Student Project Presentations

None

 

20 December 2024

Student Project Presentations

None