Lab: Shell

Assigned

September 11, 2024

Due

September 18, 2024 11:59pm

Collaboration

All labs in this class should be completed with your assigned group. You may ask for assistance from the instructor or mentor, but you may not discuss any aspect of your work on the lab with other students in the class unless they are in your group.

Submitting

make zip

https://gradescope.com

Shell

group.txt

Overview

In this lab, you will implement a simple “shell” program. A shell is an interactive program that allows you to start and manage other programs, serving as a sort of intermediary between you, the OS, and the running processes. When you run a program in the terminal, you are actually interacting with the shell.

The shell you write for this program (called mysh) will support basic functionality that allows you to run programs in the foreground and background, and to move around the filesystem.

To obtain a copy of the starter code, one member of your lab group should perform the following steps:

Log in to a MathLAN machine.
Open a terminal window.
Use the git command to check out a copy of the starter code for the lab:
```
$ git clone /home/curtsinger/csc213/labs/shell ~/csc213/labs/shell
```
And now you can use the code command to open the starter code with Visual Studio Code.
```
$ code ~/csc213/labs/shell
```
A Visual Studio Code window should appear with the shell directory open in the file browser. You may see a welcome message, which you can close. You can also close any prompts to upgrade to a new version of VSCode.
Open a terminal inside of VSCode using the Terminal menu. By default, terminals appear on the bottom of the window. I find it more conveninent to move it to the right side; just right-click somewhere near the top of the panel that appears and choose Move Panel Right.
Now you can run make in the terminal to build the starter code, or just type ctrl+shift+b to run the default build task (which just runs make).

As you work through this lab, make sure you understand and handle all of the error conditions for the POSIX functions you call. A component of your grade on this lab will be determined by your use of good coding style, including checking for and handling error conditions in a reasonable way. You are allowed to skip error checking for specific functions we’ve discussed in class, but when you’re unsure, please default to checking error codes if the man page says one could be returned.

Part A: Reading in Commands

The starter code includes a simple command prompt loop that reads in a line of text and prints it back out. The first step in actually executing these commands is to break them into pieces. Take a look at the manpage for execvp, which is the variant of exec we will use for this lab. This function takes in a command name or path and an array of arguments than ends with a NULL argument. Write code to break the command line string into an array of char*s ending with a NULL entry. You can do this with strsep or strtok_r. You may assume there will be no more than MAX_ARGS parts to any one command. Please use the MAX_ARGS constant defined in mysh.c so this number can be changed without replacing constants sprinkled throughout your code.

You do not need to support quoted arguments in your shell.

Part B: Launch a Process

Now that you can read in and break apart commands, use fork to create a child process, execvp to launch the command in the child process, and wait to have the shell wait for the child process to complete. The reason we are using the execvp function for this lab is so we do not need to implement path resolution, the process of searching through directories in the PATH environment variable to find an executable that matches the given command name.

The convention for exec is that the first argument passed to the program is the name of the program. For example, the command ls /home/curtsinger runs the program /bin/ls with the arguments ls and /home/curtsinger.

As soon as a child process finishes, print a message of the form [CMD exited with status N], where CMD is the name of the executable that ran (excluding any arguments), and N is the exit status. Make sure you review the man page for wait to see how you access the exit status of a child process; you will need to use the WEXITSTATUS macro. There are good examples of how to do this in the manpages.

Your shell should produce output like the following:

              $ ls
group.txt Makefile mysh mysh.c
[ls exited with status 0]
$ grep
usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
        [-e pattern] [-f file] [--binary-files=value] [--color=when]
        [--context[=num]] [--directories=action] [--label] [--line-buffered]
        [--null] [pattern] [file ...]
[grep exited with status 2]

            

Part C: Built-in Commands

While the command cd is actually an executable in /bin, this won’t work for our shell; when you run the command you do change directories, but this doesn’t change the working directory of the parent process (your shell). Instead, your shell will need to check if the current command is cd. If it is, your shell should call the chdir function to change directories instead of calling fork and exec to run /bin/cd in a child process.

You should also add special handling for blank lines (which should do nothing) and the exit command.

Built-in commands and blank lines should not produce exit status messages. For example:

              $ ls
group.txt Makefile mysh mysh.c
[ls exited with status 0]
$ cd ..
$
$ pwd
/home/runner
[pwd exited with status 0]
$ exit

            

Part D: Multiple Commands

Most shells also allow you to invoke multiple commands in sequence using a semicolon. For example, cd ..; pwd; ls will move up a directory, print the full path of that directory, then print the files in that directory. Modify your shell to support multiple commands chained together with a semicolon.

I recommend using the strpbrk function to split commands at semicolons. The strpbrk function works like strchr, except it searches for one of a collection of delimeters. Conversely, you can think of it as similar to strsep, except it does not modify the input string by overwriting the delimiter. This is particularly useful when you go on to the next part of the lab, which requires that you treat commands differently depending on what delimiter separates them from the next command.

The following code shows how you can use strpbrk to split a string at both periods and commas:

              #include <stdbool.h>
#include <string.h>

void split_string(char* str) {
  char* current_position = str;
  while(true) {
    // Call strpbrk to find the next occurrence of a delimeter
    char* delim_position = strpbrk(current_position, ".,");

    if(delim_position == NULL) {
      // There were no more delimeters.
      printf("The last part is %s.\n", current_position);
      return;

    } else {
      // There was a delimeter. First, save it.
      char delim = *delim_position;

      // Overwrite the delimeter with a null terminator so we can print just this fragment
      *delim_position = '\0';

      printf("The fragment %s was found, followed by '%c'.\n", current_position, delim);
    }

    // Move our current position in the string to one character past the delimeter
    current_position = delim_position + 1;
  }
}

            

Once you have semicolon-separated commands, your shell should behave as follows:

              $ ls; pwd
group.txt Makefile mysh mysh.c
[ls exited with status 0]
/home/runner/Lab-Shell
[pwd exited with status 0]
$ exit

            

Motice that the exit status for each command is printed before starting the next command. Make sure your shell prints statuses in the same way or the autograder will reject your implementation.

Part E: Background Commands

Add support for background commands, which are launched with the & symbol. The most common use case for background commands is to launch a single executable without blocking the current shell. For example, the command sleep 5 will block the shell for 5 seconds before a prompt is printed again, while sleep 5 & will run the sleep command in the background and allow you to run additional commands immediately.

It is also possible to run multiple commands with a single invocation separated by an ampersand, much like a semicolon. Unlike the semicolon separator, the ampersand will run the joined commands simultaneously instead of in sequence. You can also combine semicolon and ampersand delimited commands, with the following rules.

If the command is followed by an ampersand, it runs in the background. The shell immediately runs the next command or prompts for the next command if there isn’t one.
If the command is followed by a semicolon, it blocks the shell until the command finishes. When the command finishes, print the command’s exit status.
If the final command in the command line input does not have a semicolon or ampersand at the end, the behavior is the same as a command with a semicolon after it.

Hint: Don’t pass the ampersand or semicolon separator as an argument to the child process or built-in command.

When you have one or more background commands running, the shell should print the exit status of any command that quits between command invocations. To check for exited background commands, you’ll need to use the waitpid function instead of wait, and pass in the WNOHANG flag to check for exited commands without blocking the shell. You’ll need to run this check in a loop to collect all of the exited background commands, not just one.

Print the exit status for background commands (commands followed by &) with a message of the form [background process PID exited with status N], where PID is the process ID of the background command, and N is the exit status, as with semicolon commands. For example:

              $ sleep 1 & sleep 1; sleep 1
(shell pauses one second here)
[sleep exited with status 0]
(shell pauses one second here)
[sleep exited with status 0]
[background process 492 exited with status 0]
$

            

Extra Challenges: Subcommands, Pipes and Redirection

Another important feature of a shell is the ability to compose commands to perform more complex operations. These aren’t required for your implementation, but if you’d like an extra challenge you could add one or more of these features:

Subcommands, which allow you to wrap a command in backticks (`) to use the output of that command as part of another command line.
Redirection, which allows you to send the contents of a file as input to a command with <, or the output of a command to a file with >.
Pipes, which allow you to use the output of one command as the input to another command with |.

To implement these features, you may need the functions dup2, open, close, and pipe. These use some file-related functionality that we haven’t talked about in class. I am happy to discuss the extra challenges with you during office hours if you are interested in implementing these features.

Questions & Answers

When I run ls in my shell it thinks I am passing the parameter "". What do I do?: Make sure you put NULL at the end of the args array you pass to execvp, not "\0".
How should the shell run with ampersands between commands?: You don’t have control over which background jobs run in which order, so the behavior may be a bit unpredictable. Your shell just has to worry about starting commands in the appropriate mode (wait or don’t wait). That behavior is based on the trailing character. If the subcommand ends in ;, wait for it to finish before moving on. If it ends in &, move on right away. The examples in the lab using the sleep command are the easiest way to test your implementation that I could come up with, but if you have alterantives I would be happy to see them.
When should the shell print the exit status for child processes?: Your shell should print the status of any exited child commands just before printing a new prompt ($) line. Make sure your shell prints the status for all exited children before the prompt, not just one. You will need to check for exited children in a loop until you find there are none.