make zip in the VSCode terminal to create a zip archive of your work. Log in to Gradescope at https://gradescope.com and upload it to the Ngram Generator assignment on Gradescope. You can resubmit as many times as you like before the deadline.
For this assignment, you will implement a simple command line utility that reads in strings from standard input and produces a list of all the ngrams contained in the input. An ngram is a length-n sequence of characters, sometimes used for language modeling or other kinds of text analysis. Your program should print all of the ngrams in a given string; see the examples below for details.
You will need to complete this assignment using the provided starter code, and upload your code to Gradescope. Follow these steps to set up your working copy of this assignment:
$ mkdir -p ~/csc213/assignments ~/csc213/exercises ~/csc213/labs
git command to check out a copy of the starter code for the assignment:
$ git clone /home/curtsinger/csc213/assignments/ngram ~/csc213/assignments/
code command to open the starter code with Visual Studio Code.
$ code ~/csc213/assignments/ngram
ngram directory open in the file browser. You may see a welcome message, which you can close. You can also close any prompts to upgrade to a new version of VSCode.make in the terminal to build the starter code, or just type ctrl+shift+b to run the default build task (which just runs make).We’ll use VSCode as the default editor for this class. You can use other editors if you prefer, but you’ll be missing out on some useful features. The VSCode projects I distribute will automatically format your C code, and will include some default settings that help with syntax highlighting, running build tasks, etc.
At this point you should read through the requirements for the assignment and review the provided code.
The best way to understand the expected output for this tool is to see an example.
The ngram program reads from standard input, which we can send to the program using the shell | operator.
This pipe takes the output of the preceding command and sends it as input to the next command.
We can generate output easily with the echo command, but the echo command adds a newline to the end of the string by default, so I will use it with the option -n to omit the newline in the examples below.
Here is a simple run of the ngram tool to produce all the trigrams of a given input:
$ echo -n "This is a test" | ./ngram 3
Thi
his
is
s i
is
is
s a
a
a t
te
tes
est
Note that the tool does not print “st” or “t” at the end of the run—it should only print complete trigrams. If the input does not include enough characters to form any complete ngrams, the tool should print nothing. The ngram command should work for ngrams as short as 1, up to , the largest positive integer that can be represented in a signed 32 bit number (provided your computer has enough memory to hold an ngram that large and your operating system will allow you to request that much memory). In other words, you may not hard-code an upper limit on the number of characters in an ngram. Similarly, there is no upper limit on the length of the input to the program. That means you should not attempt to read all of the input and store it in the program at one time.
The starter code includes some basic validation of the single command line argument that specifies the length of the ngrams that should be collected. Please write your solution in the main function below this validation code. You are welcome to add helper functions, but please do not change or remove the code that validates the N parameter.
Please use reasonable coding practices in writing your solution;
that means you should write comments, indent your code properly, and use informative variable names.
You should also follow best-practices in writing your solution;
that means you should free and memory you malloc, and make sure to check for errors returned by standard POSIX functions.
I strongly encourage you to search for POSIX functions that will help you construct your solution rather than writing everything from scratch.
I highly recommend that you read input one character at a time using a function like fgetc;
this is a bit easier than reading multiple characters at a time, and should be just as fast.
One approach you should not use is to read in the entire input at once.
The input may be too large for you to store in memory at one time, so you will have to process it as you read it in.
You may have noticed that the provided Makefile builds with the -Wall and -Werror options.
This tells the compiler to give all available warnings, and to treat any warning as a build error.
I will use these options when building your code for grading, so make sure the code you submit does not produce any errors or warnings.
N?N or the input is huge?N or the length of the program’s input.ngram program stop?Here are a few additional example runs of ngram that may be useful for testing your implementation:
$ echo -n "Hello" | ./ngram 1
H
e
l
l
o
$ echo -n "Hi" | ./ngram 2
Hi
$ echo -n "Hi" | ./ngram 3
$ echo -n "213 is great" | ./ngram 2
21
13
3
i
is
s
g
gr
re
ea
at