• CosmosisQ@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      IANAL, but theoretically, it’s not possible to copyright model weights (at least in the US). While the licensing of large language models hasn’t been specifically tested in court, people have tried and failed with other machine learning models. The alleged copyright holder may refuse to do business with you in the future, but you’re unlikely to face legal repercussions.

  • Slimxshadyx@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Wow! Exciting! Are these uncensored models or does the training data include refusals? Does anyone know? What was orca 1?

    • professorlust@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Given the legal challenges to the use of training data, you’re probably never going to see the public release of training data of a major corporation LLM.

      There will be leaks from time to time but no corporation will expose themselves to litigation just help the open source community

  • TheCrazyAcademic@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    It’d be interesting to see how an MoE framework of multiple Orca 2s each trained on different subsets of data basically routing your prompt to different orca 2 experts would fair. I feel like that can come extraordinarily close to a GPT 4 in performance metrics but would take decent computing power to test the hypothesis. If each orca 2 expert is 10 billion parameters and you wanted to run a 100 billion sparse orca 2 MoE that’s gonna require at least 500 gig+ of VRAM at minimum.

  • PwanaZana@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Obvious question (and I’m assuming the answer is We didn’t try it yet): How does this model fare in terms of performance/output?

  • littlexxxxx@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    The paper does not explain the real interesting question to me, which is the reasoning strategy and its related system instruction for each sub-tasks, and how did they select the strategy for each clustered sub-task, manually or through some prompts by leveraging openai api.

    If they did the main task by hand, then this paper is not insightful and useful at all.

  • xplode145@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    can someone give me ELI5 version of how can i train ORca2 with my local data files/folders? pretty please.

  • visarga@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Tried the models, the 13B is very slow, the 7B is speedy but a little quirky. It made the plan how to solve the task but didn’t actually proceed in solving the task. It doesn’t have good conversational flair.

  • LinuxSpinach@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Progressive Learning: We start with LLaMA-2-7B or LLaMA-2-13B checkpoint and
    finetune it on the train split of FLAN-v2 dataset for one epoch. Note that FLAN-v2 dataset
    contains both zero-shot and few-shot problems. We then train on 5 million ChatGPT data
    from Orca 1 for 3 epochs. Then we train on the combination of 1 million GPT-4 data from
    Orca 1 and Orca 2’s 817K data for 4 epochs.