I have a large corpus of notes humans wrote to summarize articles. As is they will give you the gist but are not very readable. I would like to use a gen model and ask “please write a short sentence that will be nice to read describing the following facts” and feed it the notes, to obtain a brief readable summary.

Language is Italian.

Suggestions on models or workflows?

Thanks

  • Kimononono@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago
    1. you’d probably want to embed the notes and then use cosine similarity to find similar notes given your input query (“please write a short sentence describing the following facts…”, use the facts in the cosine similarity search ).

    2. Then pass the similar notes into a llm with a instruction like “please write a short sentence describing the following facts using the notes”

    I don’t know how well embeddings work for italian so you may want to translate them to english and keep them in pairs (italian version, english version) then use the english for the cosine similarity search ( step 1. ) and the italian version for the summarization ( step 2. )