How to split a very large text or CSV file by a specific number of lines/rows

This posting will describe how to split a very large CSV or Text file into a number of smaller parts by specifying the number of desired lines within each of the resulting pieces (for example, by 65536 lines for use with Excel 2003 or 1,048,576 for Excel 2007). The method described will use free software to do this and will work for very large files (even files larger that 4 gigs in size), will avoid loading the entire file into memory when processing, will maintain the format/extension of the original file, and will not add any additional information to the resulting files.

This article was borne out of a need to split a large 600 meg CSV file into several pieces that were no longer than 100,000 rows each for uploading into a client’s website. The main problem we faced in doing so was that it was impossible to load the file into tools such as Excel (or course), text editors, or even Access because doing so would invariably cause any of those apps to crash. In these cases, it was possible to split the file using a freeware program called Gsplit, which processes the file sequentially without loading it into memory all at once. This posting will describe exactly how this is done.

A step by step guide:

Background: I am going to describe an example whereby I split a file called “chunk.csv” into several pieces that are a maximum of 10,000 rows each. This same process will work on any type of text file (for example TXT, INI, LOG, BAT, DIZ, BAK, and QUE) and you can customize this process to exactly the number of rows you require for your use.

Download and install Gsplit. This is the program that will perform the operation and it is 100% freeware, even for commercial use.

Run Gsplit.

Click “Original file” in the left sidebar. Click on the “browse” button under “file to split”. Browse to and click on the file you want to split.

Click “Destination folder” in the left sidebar. Click on the “browse” button under “destination path” on the right. Browse to the folder where you would like your output files to be saved.

Click “Type and Size” in the left sidebar. This is where all the interesting stuff will take place. Click on “Blocked Pieces” icon. Next, from the dropdown select “I want to split after the nth occurrence of a specified pattern”; and from the dropdown under that “Split after the occurrence number”. Enter the number of lines you want to split underneath (10,000 in this case, but you can enter the value that you need). Lastly, make sure that the pattern to use for splitting is “0x0D0x0A” (without quotes). This value is the hex code for a carriage return, and it should be displayed by default.

Click “Filenames” in the left sidebar. In the “piece name mask” field enter “{ofw}_{num}{ore}”(without quotes). This will generate pieces with the original file extension and the original file names that look like “filename_1.csv”. Alternately, you could leave this alone altogether, go with the default generated names, and simply rename the extension to “.csv” (or whatever your original extension is) in Windows explorer.

Click “Other Properties” in the left sidebar. check “do not add tags to piece files”, This will ensure that no additional information will be added to your original data.

You’re now ready to split. Click on the “Split” button in the upper toolbar. Your file should be split within seconds. Once the splitting process is finished you will see the “Splitting log” screen. Click on the “Open the folder in Windows Explorer” link to instantly jump to the output folder.

Optional: save your splitting profile. If you are going to repeat this splitting process in the future you might consider saving these settings as a “profile” that can be loaded when you need it so that you do not have to go through these process again. To do so select “Save a Profile As” from the “File” Menu.

Had similar issues recently with parsing proxy log. Not THAT large but still ~480000 lines. 🙂

I ended up using Jet OLE connection and hooked it into AnySQL Maestro. Treating text file as SQL database felt weird at first but worked surprisingly well.

SOA

Nice write up. Good utility.

John

I don’t think whoever wrote this article used the application in question to split the 600 mb file using the “split occurance” function. Sure it might work ok for smaller files(like from the picture 17 MB) but when you start trying to use it to split files of a 600 MB nature in this way it just doesn’t work.

This works wonderfully! I just completed splitting a number of huge text files (1 to 1.75 Gb). It took 2.5 mins to split the largest file into 4 files each with 2 million lines! 2 mil lines of my text file contents roughly translates to 500 Mb. It is still big for Notepad to open. But, MS Access can handle this easily.

I like the UI too besides the speed of the Gplit.

Thank you very much!

ronmagic

Worked fine with a 700 MB csv file with over 3.5 million rows. Split into 5 parts, each with 800K lines. Thanks for the great article.

Have to get used to it’s layout, but the first split was ok in 10 minutes. 12 files with a max of 20000 lines per file.

Bas

Kev Taylor

I had to split a 3.5 million line text file so I could open it in Excel. Could not get it to split with pattern function but used size split into 5 equal pices and it worked like a charm!

Samer

@ Kev: glad to hear it!

Rene

I needed to load a large – 36GB, 300 million rows – file into mysql. Tried many differnet ways but didn’t work. I came across your article, downloded GSplit followed the intstructions and have now got a fully populated table with no errors. Admittedly it took a while to load 30 files of 10 million rows each but I can’t complain at a load rate of around 8 mins per 100million rows.

Thanks for the article. Saved a lot of headaches and what is left of my hair.

Rajbardhan

Very Very Good Post.. I required this from Long Back

Thanks a lot SAMER KURDI

William

Thanks Samer Kurdi. Works like a charm for my files which go up to 600 mgb in size.

You can also use this (small) CSV splitter. There is possibility to choose encoding of input file or do it automaticaly. This feature very useful if intput file have non-english language.http://dfiles.ru/files/mzxb7faqx

Bogdan Tkachyk

You can also use this small CSVSplitter. It has nice feature to choose input file encoding, it’s very useful if file language is non-english (delete space before “.ru”) dfiles. ru/files/mzxb7faqx

Shobhit

Thanks you so much you saved my nights

Economy

For some txt files the pattern “0x0A” should be used instead of the pattern “0x0D0x0A” (without quotes)

Freesoul

Dude I am speechless. I can’t thank you enough for this information. I was searching this for many days. I tried cmd and other things, which are not my cup of tea, but to no avail. I tried to use CSV splitter as mentioned in a comment, but it didn’t perform function what I required, and then comes Gsplit which save my day and night 🙂 Moreover, I also like the caption at this website- Hope someday all software will be free. many many thanks again