• wryso@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    It is most plausible the board found out something where this was their only choice given their fiduciary duties. I’m betting OpenAI trained their next generation models on a ton of copyrighted data, and this was going to be made public or otherwise used against them. If the impact of this was hundreds of millions of dollars or even over a billion (and months of dev time) wasted on training models that have now limited commercial utility, I could understand the board having to take action.

    It’s well known that many “public” datasets used by researchers are contaminated with copyrighted materials, and publishers are getting more and more litigious about it. If there were a paper trail establishing that Sam knew but said to proceed anyway, they might not have had a choice. And there are many parties in this space who probably have firsthand knowledge (from researchers moving between major shops) and who are incentivized to strategically time this kind of torpedoing.

    • cuyler72@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      It’s already been decided that using copyrighted material in AI models is fine and not subject to copyright in multiple court cases though.

      • wryso@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        This is far from completely litigated, and even if the derivative works created by generative AI that has been trained on copyrighted material are not subject to copyright by the owners of the original works, this doesn’t mean:

        • companies can just use illegally obtained copyrighted works to train their AIs
        • companies are free to violate their contracts, either agreements they’re directly a party to, or implicitly like the instructions of robots.txt on crawl
        • users of the models these companies produce are free from liability for decisions made in data inclusion on models they use

        So I’d say that the data question remains a critical one.