GBNF, a rebranding of Backus-Naur Form is a kind of Regex if you somehow made Regex more obtuse and clunky and also way less powerful. It’s like going to the dentist in text form. It is bad, and should feel bad.

HOWEVER, if you tame this vile beast of a language you can make AI respond to you in pretty much any way you like. And you should.

You can use it by pasting GBNF into SillyTavern, Oobabooga, or probably something else you might be using. First, click on the

settings thingie

then scroll down and paste it like so:

just pasting is enough.

In Ooba, you can go to

https://preview.redd.it/0j7nhuj23fxb1.png?width=521&format=png&auto=webp&s=82688cee191ddbbdc1bf5789e2dcb0e99693a7bf

And then

https://preview.redd.it/kcbur3s53fxb1.png?width=794&format=png&auto=webp&s=6b31c1a6c5f954bc2bbbe1488b0a71d164478de9

Note that not all loaders support it, I think it’s limited to llama.cpp, transformers, and _HF variants.

Then, your next messages will be formatted like you wanted. In this case, every message will be "quoted text", *action text* or multiple instances. It should be simple to understand.

Here’s that one in case you want it, I just wrote it and tested it:

root ::= (actions | quotes) (whitespace (actions | quotes))*

actions ::= "*" content "*"
quotes ::= "\"" content "\""

content ::= [^*"]+

whitespace ::= space | tab | newline
space ::= " "
tab ::= "\t"
newline ::= "\n"

Even if you don’t know Regex this language should be easy to pick up, and will allow you to make LLMs always respond in a particular format (very useful in some cases!)

You can also look at the examples.

There are websites to test BNF like this one but since it’s a badly designed, badly implemented language from hell, none of them will work and you will have to look at the console to find out why this ugly duckling of a language didn’t want to work this time. Imagine if Batch files had regular expressions, it’d probably look like this. All of that said, this is pretty fucking useful! So thanks to whoever did the heavy lifting to implement this.

  • davidy22@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I won’t stand for this slander for GBNF, there’s a reason why programming language grammar is defined in GBNF and not regexes.

    • Dead_Internet_Theory@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Actually GBNF is this re-branding, BNF is the proper name (the G is Georgi Gerganov’s). There’s also a reason why languages compile to assembly but that doesn’t mean it’s user-friendly. Or Abstract Syntax Trees. There’s stuff that pretty much only applies to compilers, doesn’t mean it’s a good general-purpose solution.

      Though I must imagine implementing BNF is orders of magnitude easier than implementing the monster that is extended regular expressions.

  • Calandiel@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I dunno in what bubble ebnf is worse than regex. Its more general (especially as a notation), easier to parse both for humans and computers, about as easy to codegen and is used all over the place in programming language design. The only advantage regex has is brevity and being able to fit it in a one liner for quick filtered search.

  • DarthNebo@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I’d rather stick with pydantic declaration via Langchain than something that needs to be so hand written

  • FPham@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    GBNF is 10x more readable than regex, but neither one is very human friendly.

  • dicklesworth@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    GBNF is super powerful, and anyone developing software with locals LLMs should learn about how to use it. As part of my larger open source project, Swiss Army Llama, I recently made a couple very handy tools for working with GBNF grammars. You can supply either an example JSON or a Pydantic data model, and it will automatically generate the complete GBNF grammar for you reflecting the same fields. It even supports some degree of nested fields. And there is another tool for taking a complete GBNF grammar specification and validating it. You can see how I implemented these particular tools here:
    https://github.com/Dicklesworthstone/swiss_army_llama/blob/main/grammar_builder.py

    Or if you just want to use the tools, you can install my project:
    https://github.com/Dicklesworthstone/swiss_army_llama/tree/main

    And just find the relevant endpoints in the Swagger page, which makes it super easy to try them out.

    • Dead_Internet_Theory@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      The regex editor is something else and is about find/replace after the AI has generated stuff. GBNF is about restricting the AI to only generate specific stuff. Like imagine you want a yes/no answer and the AI is physically unable to answer anything but that. While Regex is more about “replace all instances of X with Y in the response”.

  • nderstand2grow@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I think this is only available on llama.cpp. I’ve been using it for a while for simple structured outputs and am extremely happy with the results. With OpenAI’s function calling, I always had to write validators – first to make sure the output is indeed a JSON, and then another validator to make sure the JSON complies with my JSON schema. grammar makes all of that redundant because it is 100% guaranteed to generate the desired output (including JSON).

    • Dead_Internet_Theory@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Yeah I didn’t even thought this was possible, but it makes for a much safer way to do function calling! Like, imagine the pain of protecting against all the myriad exploits vs just using this. It’s fantastic.

      And yeah I can only use it in llama.cpp for some reason too, but I got the impression _HF should have it.