make zip in the VSCode terminal to create a zip archive of your work. Log in to Gradescope at https://gradescope.com and upload it to the Archive Printer assignment on Gradescope. You can resubmit as many times as you like before the deadline.
For this assignment, you will write a small program that prints out the contents of a UNIX archive. This format, which typically uses the file extension “.ar”, is a predecessor for the .tar format. UNIX archives are also used to hold libraries you can link to compiled programs.
An archive contains one or more files using a relatively simple data layout. Your program will open an archive file (using some provided code) and print out each file in the archive. To read the archive format you will need to use some pointer manipulation techniques you probably have not seen before this course.
You can find the full details about the .ar format on Wikipedia, but you will only have to support a limited subset of the format for this assignment. The overall structure of a .ar file is a file signature followed by a file header, the data for that file, the next file header, that file’s data, and so on until the end of the file.
You will need to complete this assignment using the provided starter code, and upload your code to Gradescope. Follow these steps to set up your working copy of this assignment:
$ mkdir -p ~/csc213/assignments ~/csc213/exercises ~/csc213/labs
git command to check out a copy of the starter code for the assignment:
$ git clone /home/curtsinger/csc213/assignments/print-archive ~/csc213/assignments/
code command to open the starter code with Visual Studio Code.
$ code ~/csc213/assignments/print-archive
print-archive directory open in the file browser. You may see a welcome message, which you can close. You can also close any prompts to upgrade to a new version of VSCode.make in the terminal to build the starter code, or just type ctrl+shift+b to run the default build task (which just runs make).We’ll use VSCode as the default editor for this class. You can use other editors if you prefer, but you’ll be missing out on some useful features. The VSCode projects I distribute will automatically format your C code, and will include some default settings that help with syntax highlighting, running build tasks, etc.
At this point you should read through the requirements for the assignment and review the provided code.
The file signature is the string !<arch> followed by a line feed character (code 0x0A).
Each .ar file begins with this signature so a program reading it can verify the format.
The starter code checks this file signature, so you can safely assume your code will only have to process .ar files.
You will still need to skip past these eight bytes (seven normal characters and a line feed).
Note that there is not a null terminator at the end of this string.
Also, keep in mind that the file signature only appears at the start of the .ar file, not before every file.
Each file header gives us a variety of important information about each file, although we’ll just need two pieces of the header: the file identifier (its name) and the file size. Here are all the fields in the file header along with their size in bytes.
/ character, followed by spaces to fill the rest of the 16 bytes. Files that are longer than 15 characters are stored in a different way, but you do not need to support longer filenames.sscanf, strtod, or atoi to convert these values to integers. Any unused bytes after the number are filled with spaces.sscanf, strtod, or atoi.0x60 and 0x0A.The actual contents of a file begins immediately after the ending characters of the file header. The number of bytes of file data is the file size, stored in the file header. There is no special character to mark the end of the file data.
One odd constraint of the .ar format is that file headers must always begin an even number of bytes away from the start of the file. The header itself is an even number of bytes, but if the file data has an odd length there is one byte of padding immediately after the file data, but this byte is not part of the file data.
uint8_t?'\0' appear in the contents of a file?Your task is to implement the print_contents(uint8_t* data, size_t file_size) function in the starter code.
This function will be called with a pointer to the beginning of an archive file’s entire contents, along with the size of the archive file.
The function should print the name of each file followed by a newline, the contents of the file, and then another newlines.
The sample inputs will include files that end in newlines, so there should be a blank line after each file’s contents.
To test your archive reader, you will need some input archive files.
The starter code includes an inputs directory that contains some simple test files.
You can create your own .ar file if you would like to test additional inputs with a command like this one:
$ ar rcs output.ar input1.txt input2.txt
This will create a new file named output.ar that should work with your reader, as long as you create the file on a Linux machine.
Like many old file formats, this one has many variants.
On macOS, the ar tool produces a slightly different version of the format that your program does not need to support.
The tool will work fine on the provided inputs even when you run it on a mac, but if you want to make additional inputs you will have to do so on a Linux machine.
This assignment will almost certainly force you to manipulate pointers in an unfamiliar way.
We’re used to using pointers to access consecutive values of the same type: arrays.
This format instead intersperses headers with file data of variable length.
That means you’re likely going to need to do addition on pointers.
You can add constants to pointers, but it’s important that you understand how this works.
Adding 5 to an int* will add 5 * sizeof(int) bytes to the pointer.
Generally if you’re working with values of a known size you would use fixed-size types like uint8_t, uint16_t, uint32_t or uint64_t, which are guaranteed to be 8-bit, 16-bit, 32-bit, and 64-bit unsigned integers, respectively.
You might also want to create a struct to hold the file header data.
You can add fields that are the appropriate size for each entry, but your compiler might try to insert additional padding between fields in the struct.
To prevent this, you have to add the option __attribute__((packed)) to the end of the struct definition.
For example, the following struct will almost certainly include hidden padding bytes to bring the size up to a more reasonable value (8 bytes seems likely):
struct somestruct {
int x;
char chars[3];
};
If we instead want this struct to be packed together with no extra space (so it matches a specification like the one for our file header), we could write:
struct __attribute__((packed)) mystruct {
int x;
char chars[3];
}
If you skip this attribute your file header may contain unwanted padding bytes, so your reader will not access the correct values in the header. Whether you use a struct for the header or not, you will almost certainly need pointer math to get from one file header to the start of the next file header.
The inputs directory includes five archives, each containing one additional file. These files all contain text, and include a mix of even and odd sizes. Here are the expected outputs for each input file. Pay close attention to the number of blank lines. The padding between the end of a file’s data and the next file header is a newline character, so incorrect implementations could potentially print one additional newline after odd-sized files.
$ ./print-archive inputs/input1.ar
a.txt
Greetings from the file a.txt
$ ./print-archive inputs/input2.ar
a.txt
Greetings from the file a.txt
b.txt
Hello from b.txt as well!
$ ./print-archive inputs/input3.ar
a.txt
Greetings from the file a.txt
b.txt
Hello from b.txt as well!
c.txt
Yet another hello, this time from c.txt.
$ ./print-archive inputs/input4.ar
a.txt
Greetings from the file a.txt
b.txt
Hello from b.txt as well!
c.txt
Yet another hello, this time from c.txt.
d.txt
An again, here's a hello from d.txt.
$ ./print-archive inputs/input5.ar
a.txt
Greetings from the file a.txt
b.txt
Hello from b.txt as well!
c.txt
Yet another hello, this time from c.txt.
d.txt
An again, here's a hello from d.txt.
e.txt
This is getting a bit old, but here's e.txt.