Using Linux - Large Text Files: Difference between revisions

From Computer Science at Indiana State University
Jump to navigation Jump to search
m 1 revision imported
 
Line 8: Line 8:
First, we create a directory in our home directory for keeping the shakespeare file.
First, we create a directory in our home directory for keeping the shakespeare file.
<pre>
<pre>
cs299@cs:~> cd ~
jkinne@cs:~> cd ~
cs299@cs:~> mkdir shakespeare
jkinne@cs:~> mkdir shakespeare
cs299@cs:~> cd shakespeare
jkinne@cs:~> cd shakespeare
</pre>
</pre>
Next we copy the text file from where it is stored on the CS server and use the wc command to see how many lines, words, and characters (bytes) are in the file.
Next we copy the text file from where it is stored on the CS server and use the wc command to see how many lines, words, and characters (bytes) are in the file.
<pre>
<pre>
cs299@cs:~/shakespeare> cp /u1/junk/shakespeare.txt .
jkinne@cs:~/shakespeare> cp /var/junk/shakespeare.txt .
cs299@cs:~/shakespeare> ls
jkinne@cs:~/shakespeare> ls
shakespeare.txt
shakespeare.txt
cs299@cs:~/shakespeare> wc shakespeare.txt  
jkinne@cs:~/shakespeare> wc shakespeare.txt  
  124787  904061 5589890 shakespeare.txt
  124787  904061 5589890 shakespeare.txt
</pre>
</pre>
You can also use the nano text editor (or whatever text editor you like) to look through the text file.
You can also use the nano text editor (or whatever text editor you like) to look through the text file.
<pre>
<pre>
cs299@cs:~/shakespeare> nano shakespeare.txt
jkinne@cs:~/shakespeare> nano shakespeare.txt
</pre>
</pre>
You can use the head and tail commands to look at the start and end of a text file, and grep to look for particular lines in the file.
You can use the head and tail commands to look at the start and end of a text file, and grep to look for particular lines in the file.
<pre>
<pre>
cs299@cs:~/shakespeare> head shakespeare.txt
jkinne@cs:~/shakespeare> head shakespeare.txt
The Project Gutenberg EBook of The Complete Works of William Shakespeare, by
The Project Gutenberg EBook of The Complete Works of William Shakespeare, by
William Shakespeare
William Shakespeare
Line 37: Line 37:
** This is a COPYRIGHTED Project Gutenberg eBook, Details Below **
** This is a COPYRIGHTED Project Gutenberg eBook, Details Below **
**    Please follow the copyright guidelines in this file.    **
**    Please follow the copyright guidelines in this file.    **
cs299@cs:~/shakespeare> head -n 1 shakespeare.txt
jkinne@cs:~/shakespeare> head -n 1 shakespeare.txt
The Project Gutenberg EBook of The Complete Works of William Shakespeare, by
The Project Gutenberg EBook of The Complete Works of William Shakespeare, by
cs299@cs:~/shakespeare> tail -n 1 shakespeare.txt
jkinne@cs:~/shakespeare> tail -n 1 shakespeare.txt
*** END: FULL LICENSE ***
*** END: FULL LICENSE ***
cs299@cs:~/shakespeare> grep "Copyright" shakespeare.txt
jkinne@cs:~/shakespeare> grep "Copyright" shakespeare.txt
what you can do with this work.  Copyright laws in most countries are in
what you can do with this work.  Copyright laws in most countries are in
cs299@cs:~/shakespeare> grep -i -m 3 "Copyright" shakespeare.txt  
jkinne@cs:~/shakespeare> grep -i -m 3 "Copyright" shakespeare.txt  
** This is a COPYRIGHTED Project Gutenberg eBook, Details Below **
** This is a COPYRIGHTED Project Gutenberg eBook, Details Below **
**    Please follow the copyright guidelines in this file.    **
**    Please follow the copyright guidelines in this file.    **
*This Etext has certain copyright implications you should read!*
*This Etext has certain copyright implications you should read!*
</pre>
</pre>

Latest revision as of 12:20, 29 September 2025

This page is a part of the Linux and CS Systems - Getting Started. This page assumes you have your computer setup to connect to the CS server, or have the appropriate software installed on your computer to run commands. Go back to the Linux and CS Systems Getting Started main page if you don't have our system setup yet.

Large Text File

On this page we walk you through looking at a text file that contains the complete works of Shakespeare (courtesy of Project Gutenberg). The file was downloaded from [1]. How many lines and words are there in this file? The file is a bit large to open with Word (you can try, it takes a while for it to actually load). So instead of opening the file in Word, we can use some of our Linux commands to get some information about the file. Check back at the Linux and CS Systems Getting Started page for more commands that might be useful.

The sample session here shows how you can copy the file into your account on the CS server and check how many lines and words are in the file. If you would like to follow along and run these commands, first login to the system and open up the terminal.

First, we create a directory in our home directory for keeping the shakespeare file.

jkinne@cs:~> cd ~
jkinne@cs:~> mkdir shakespeare
jkinne@cs:~> cd shakespeare

Next we copy the text file from where it is stored on the CS server and use the wc command to see how many lines, words, and characters (bytes) are in the file.

jkinne@cs:~/shakespeare> cp /var/junk/shakespeare.txt .
jkinne@cs:~/shakespeare> ls
shakespeare.txt
jkinne@cs:~/shakespeare> wc shakespeare.txt 
 124787  904061 5589890 shakespeare.txt

You can also use the nano text editor (or whatever text editor you like) to look through the text file.

jkinne@cs:~/shakespeare> nano shakespeare.txt

You can use the head and tail commands to look at the start and end of a text file, and grep to look for particular lines in the file.

jkinne@cs:~/shakespeare> head shakespeare.txt
The Project Gutenberg EBook of The Complete Works of William Shakespeare, by
William Shakespeare

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.org

** This is a COPYRIGHTED Project Gutenberg eBook, Details Below **
**     Please follow the copyright guidelines in this file.     **
jkinne@cs:~/shakespeare> head -n 1 shakespeare.txt
The Project Gutenberg EBook of The Complete Works of William Shakespeare, by
jkinne@cs:~/shakespeare> tail -n 1 shakespeare.txt
*** END: FULL LICENSE ***
jkinne@cs:~/shakespeare> grep "Copyright" shakespeare.txt
what you can do with this work.  Copyright laws in most countries are in
jkinne@cs:~/shakespeare> grep -i -m 3 "Copyright" shakespeare.txt 
** This is a COPYRIGHTED Project Gutenberg eBook, Details Below **
**     Please follow the copyright guidelines in this file.     **
*This Etext has certain copyright implications you should read!*