minus-squareUserMinusOne@alien.topBtoLocalLLaMA•RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training Large Language ModelslinkfedilinkEnglisharrow-up1·1 year agoHow much free space is required to do a “git clone …”? Is there a better method to download the data without requiring additional space for the history (.git). If yes, how big is the whole dataset? Given the current developments: Maybe some should start collecting raw data and serving them as torrents. … Just in case. linkfedilink
How much free space is required to do a “git clone …”?
Is there a better method to download the data without requiring additional space for the history (.git). If yes, how big is the whole dataset?
Given the current developments: Maybe some should start collecting raw data and serving them as torrents. … Just in case.