Udemy - LLM Reinforcement Learning Fine-Tuning DeepSeek Method GRPO

  • CategoryOther
  • TypeTutorials
  • LanguageEnglish
  • Total size1.8 GB
  • Uploaded Byfreecoursewb
  • Downloads60
  • Last checkedMar. 22nd '26
  • Date uploadedMar. 22nd '26
  • Seeders 9
  • Leechers22

Infohash : F59A7AB8A353750371BF1F139AF5B0A92032B223

LLM Reinforcement Learning Fine-Tuning DeepSeek Method GRPO

https://WebToolTip.com

Last updated 6/2025
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz, 2 Ch
Language: English | Duration: 3h 45m | Size: 1.85 GB

[EN] LLM Fine-Tuning and Reinforcement Learning with SFT, LoRA, DPO, and GRPO Custom Data HuggingFace

What you'll learn
You will grasp the core principles of Large Language Models (LLMs) and the overall structure behind their training processes.
You will learn the differences between base models and instruct models, as well as the methods for preparing data for each.
You’ll learn data preprocessing techniques along with essential tips, how to identify special tokens required by models, understanding data formats, and methods
You’ll gain practical, hands-on experience and detailed knowledge of how LoRA and Data Collator work.
You’ll gain a detailed understanding of crucial hyperparameters used in training, including their purpose and how they function.
You’ll practically learn, in detail, how trained LoRA matrices are merged with the base model, as well as key considerations and best practices to follow during
You’ll learn what Direct Preference Optimization (DPO) is, how it works, the expected data format, and the specific scenarios in which it’s used.
You’ll learn key considerations when preparing data for DPO, as well as understanding how the DPO data collator functions.
You’ll learn about the specific hyperparameters used in DPO training, their roles, and how they function.
You’ll learn how to upload your trained model to platforms like Hugging Face and manage hyperparameters effectively after training.
You’ll learn in detail how Group Relative Policy Optimization (GRPO), a reinforcement learning method, works, including an in-depth understanding of its learnin
You’ll learn how to prepare data specifically for Group Relative Policy Optimization (GRPO).
You’ll learn how to create reward functions—the most critical aspect of Group Relative Policy Optimization (GRPO)—through various practical reward function exam
In what format should data be provided to GRPO reward functions, and how can we process this data within the functions? You’ll learn these details thoroughly.
You’ll learn how to define rewards within functions and establish clear reward templates for GRPO.
You’ll practically learn numerous details, such as extracting reward-worthy parts from raw responses and defining rewards based on these extracted segments.
You’ll learn how to transform an Instruct model into one capable of generating “Chain of Thought” reasoning through GRPO (Group Relative Policy Optimization).

Requirements
Basic knowledge of Python programming.
Introductory-level familiarity with artificial intelligence and machine learning concepts.
Ideally, prior experience with Jupyter Notebook or Google Colab.

Files:

[ WebToolTip.com ] Udemy - LLM Reinforcement Learning Fine-Tuning DeepSeek Method GRPO
  • Get Bonus Downloads Here.url (0.2 KB)
  • ~Get Your Files Here ! 1 - Introduction
    • 1. Introduction.mp4 (11.4 MB)
    • 2. Course Content Introduction.mp4 (47.7 MB)
    • 3. Jupyter Notebooks.html (5.4 KB)
    • Notebooks 2
      • Bolum_(Section)_1.ipynb (465.1 KB)
      • Bolum_(Section)_3_DPO.ipynb (259.4 KB)
      • Bolum_(Section)_4_GRPO_.ipynb (624.2 KB)
      • Bolum_(Section)__2.ipynb (207.9 KB)
      • DS_Store (6.0 KB)
      • Quantization.ipynb (81.9 KB)
      • Thinking__(REASONING)_model.ipynb (54.8 KB)
      __MACOSX Notebooks 2
      • _.DS_Store (0.1 KB)
      • _Bolum_(Section)_1.ipynb (0.7 KB)
      • _Bolum_(Section)_3_DPO.ipynb (0.4 KB)
      • _Bolum_(Section)_4_GRPO_.ipynb (0.5 KB)
      • _Bolum_(Section)__2.ipynb (0.2 KB)
      • _Quantization.ipynb (0.4 KB)
      • _Thinking__(REASONING)_model.ipynb (0.2 KB)
      2 - Quantization, LoRA, SFT, Data Collator, Data Preparation…
      • 10. Preparing Dataset, Chat Template, and Integrating Custom Tokens.en_US.srt (13.3 KB)
      • 10. Preparing Dataset, Chat Template, and Integrating Custom Tokens.mp4 (145.9 MB)
      • 11. Continuing Dataset Preparation and Tokenization.en_US.srt (5.6 KB)
      • 11. Continuing Dataset Preparation and Tokenization.mp4 (47.0 MB)
      • 12. What is a Data Collator How Does It Work Practical Example.en_US.srt (9.1 KB)
      • 12. What is a Data Collator How Does It Work Practical Example.mp4 (84.6 MB)
      • 13. What is LoRA Why Use It.en_US.srt (3.4 KB)
      • 13. What is LoRA Why Use It.mp4 (17.0 MB)
      • 14. Integrating LoRA Matrices into the Model.en_US.srt (7.6 KB)
      • 14. Integrating LoRA Matrices into the Model.mp4 (37.6 MB)
      • 15. Setting Training Arguments (Training Hyperparameters).en_US.srt (9.8 KB)
      • 15. Setting Training Arguments (Training Hyperparameters).mp4 (32.1 MB)
      • 16. Setting Trainer, Starting Training, and Evaluating Results.en_US.srt (3.9 KB)
      • 16. Setting Trainer, Starting Training, and Evaluating Results.mp4 (21.4 MB)
      • 17. Merging Trained LoRA Matrices with the Model.en_US.srt (6.8 KB)
      • 17. Merging Trained LoRA Matrices with the Model.mp4 (51.0 MB)
      • 18. Uploading Model on Hugging Face and Using it.en_US.srt (5.7 KB)
      • 18. Uploading Model on Hugging Face and Using it.mp4 (49.4 MB)
      • 19. Hyperparameters Affecting the Outputs.en_US.srt (6.5 KB)
      • 19. Hyperparameters Affecting the Outputs.mp4 (30.3 MB)
      • 4. Quantization.ipynb.bin (81.9 KB)
      • 4. What is Quantization How does it affect model size and parameters.en_US.srt (4.9 KB)
      • 4. What is Quantization How does it affect model size and parameters.mp4 (40.2 MB)
      • 5. Create a Hugging Face Account and Get a Token.en_US.srt (5.0 KB)
      • 5. Create a Hugging Face Account and Get a Token.mp4 (35.1 MB)
      • 6. Create a Colab Notebook and Get Familiar with the Libraries.en_US.srt (4.7 KB)
      • 6. Create a Colab Notebook and Get Familiar with the Libraries.mp4 (14.7 MB)
      • 7. Bolum_(Section)_1.ipynb.bin (465.1 KB)
      • 7. Download the Model with Quantization.en_US.srt (6.8 KB)
      • 7. Download the Model with Quantization.mp4 (27.5 MB)
      • 8. Bolum_(Section)_1.ipynb.bin (465.0 KB)
      • 8. Differences Between Base and Instruct Models.en_US.srt (8.5 KB)
      • 8. Differences Between Base and Instruct Models.mp4 (78.0 MB)
      • 9. Download and Examine the Dataset.en_US.srt (4.7 KB)
      • 9. Download and Examine the Dataset.mp4 (18.9 MB)
      3 - Adding New Tokens and Creating Templates for the Tokenizer
      • 20. Bolum_(Section)__2.ipynb.bin (207.9 KB)
      • 20. Download the Model and Tokenizer.en_US.srt (4.6 KB)
      • 20. Download the Model and Tokenizer.mp4 (37.0 MB)
      • 21. Adding New Custom Tokens to the Tokenizer.en_US.srt (8.0 KB)
      • 21. Adding New Custom Tokens to the Tokenizer.mp4 (30.9 MB)
      • 22. Creating Templates with New Custom Tokens and Integrating Them into the Dataset.en_US.srt (7.7 KB)
      • 22. Creating Templates with New Custom Tokens and Integrating Them into the Dataset.mp4 (28.7 MB)
      4 - DPO (Direct Preference Optimization)
      • 23. Bolum_(Section)_3_DPO.ipynb.bin (259.4 KB)
      • 23. What is DPO What Data Format Does It Expect.en_US.srt (7.5 KB)
      • 23. What is DPO What Data Format Does It Expect.mp4 (43.4 MB)
      • 24. Bolum_(Section)_3_DPO.ipynb.bin (259.4 KB)
      • 24. Downloading Model & Understanding How the DPO Data Collator do Padding.en_US.srt (7.1 KB)
      • 24. Downloading Model & Understanding How the DPO Data Collator do Padding.mp4 (45.4 MB)
      • 25. Preparing the Dataset for DPO.en_US.srt (10.9 KB)
      • 25. Preparing the Dataset for DPO.mp4 (84.4 MB)
      • 26. Adding LoRA Matrices to the Model.en_US.srt (3.8 KB)
      • 26. Adding LoRA Matrices to the Model.mp4 (19.1 MB)
      • 27. Setting Training Arguments (with DPOConfig).en_US.srt (5.4 KB)
      • 27. Setting Training Arguments (with DPOConfig).mp4 (13.3 MB)
      • 28. Training the Model and Merging the LoRA Matrices.en_US.srt (6.9 KB)
      • 28. Training the Model and Merging the LoRA Matrices.mp4 (49.7 MB)
      5 - GRPO (Group Relative Policy Optimization) Reinforcement Learning
      • 29. Bolum_(Section)_4_GRPO_.ipynb.bin (624.2 KB)
      • 29. Thinking__(REASONING)_model.ipynb.bin (54.8 KB)
      • 29. What is a “Reasoning” Model How Does It Work.en_US.srt (5.0 KB)
      • 29. What is a “Reasoning” Model How Does It Work.mp4 (56.5 MB)
      • 30. What is GRPO How Is It Applied.en_US.srt (4.9 KB)
      • 30. What is GRPO How Is It Applied.mp4 (21.4 MB)
      • 31. Bolum_(Section)_4_GRPO_.ipynb.bin (624.2 KB)
      • 31. What are Unsloth and VLLM + Download the Model.en_US.srt (6.9 KB)
      • 31. What are Unsloth and VLLM + Download the Model.mp4 (62.7 MB)
      • 32. Examining the Dataset and Initial Preparation Steps.en_US.srt (7.6 KB)
      • 32. Examining the Dataset and Initial Preparation Steps.mp4 (54.0 MB)
      • 33. Extracting Specific Parts of Data Regex and Group Operations.en_US.srt (13.5 KB)
      • 33. Extracting Specific Parts of Data Regex and Group Operations.mp4 (49.5 MB)
      • 34. In Which Format is Data Sent to Reward Functions.en_US.srt (7.0 KB)
      • 34. In Which Format is Data Sen

Code:

  • udp://tracker.torrent.eu.org:451/announce
  • udp://tracker.tiny-vps.com:6969/announce
  • http://tracker.foreverpirates.co:80/announce
  • udp://tracker.cyberia.is:6969/announce
  • udp://exodus.desync.com:6969/announce
  • udp://explodie.org:6969/announce
  • udp://tracker.opentrackr.org:1337/announce
  • udp://9.rarbg.to:2780/announce
  • udp://tracker.internetwarriors.net:1337/announce
  • udp://ipv4.tracker.harry.lu:80/announce
  • udp://open.stealth.si:80/announce
  • udp://9.rarbg.to:2900/announce
  • udp://9.rarbg.me:2720/announce
  • udp://opentor.org:2710/announce