Power User
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Holiday_Fly_590@alien.topB to LocalLLaMAEnglish · 2 years ago

Questions on Attention Sinks and Their Usage in LLM Models

alien.top

message-square
5
link
fedilink
1

Questions on Attention Sinks and Their Usage in LLM Models

alien.top

Holiday_Fly_590@alien.topB to LocalLLaMAEnglish · 2 years ago
message-square
5
link
fedilink
alert-triangle
You must log in or register to comment.
  • dqUu3QlS@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    What’s the question?

    • esotericloop@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      See, you’re attending to the initial token across all layers and heads. :P

  • Tiny_Nobody6@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    IYH kindly post the paper link

  • Knopty@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    If you are wondering if it could be implemented, there was a modified transformers library. The author practically made changes, renamed the library to attention_sinks and presented it as a drop-in solution to use it:

    https://github.com/tomaarsen/attention_sinks/

    But it was impossible to maintain, so devs of transformers suggested him to make a patch for transformers and to maintain it, so it could be properly incorporated into the library and to be future-proof.

    The author of this code has been working on this patch since beginning of the October:

    https://github.com/huggingface/transformers/pull/26681

    • WAHNFRIEDEN@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      it’s already implemented in llama.cpp

LocalLLaMA

localllama

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !localllama@poweruser.forum

Community to discuss about Llama, the family of large language models created by Meta AI.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 4 users / day
  • 4 users / week
  • 4 users / month
  • 4 users / 6 months
  • 1 local subscriber
  • 11 subscribers
  • 1.02K Posts
  • 5.82K Comments
  • Modlog
  • mods:
  • communick
  • BE: 0.19.11
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org