@UserMinusOne - Power User

0 Posts
1 Comment

Joined 2 years ago

Cake day: October 31st, 2023

You are not logged in. If you use a Fediverse account that is able to follow users, you can follow this user.

OverviewCommentsPosts

UserMinusOne@alien.topBtoLocalLLaMA•RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training Large Language Models
link
fedilink
English
arrow-up
1·
2 years ago
How much free space is required to do a “git clone …”?

Is there a better method to download the data without requiring additional space for the history (.git). If yes, how big is the whole dataset?

Given the current developments: Maybe some should start collecting raw data and serving them as torrents. … Just in case.

link
fedilink