MAP Application Coding Challenge

Your task for this coding challenge is to write an anagram finder. The anagram finder takes a string as input and produces a list of strings that contain the same set of characters rearranged to produce different words or phrases. You will have quite a bit of flexibility in how your anagram finder works, but please read the instructions carefully to make sure you satisfy all the requirements. There is no starter code for this challenge; you’ll be writing everything from scratch. You can start the challenge any time you like, but be sure to submit your work by 11:59pm (Grinnell time) on March 8, 2021.

Here’s an example run of an anagram finder that should help you as you read the requirements below.

$ ./anagram "MAPs are great"
separate gram
agape armrest
a meager traps
a pager stream
errata gasp me
...

Requirements

Each section below describes the requirements for your anagram finder. If you have any questions you are always welcome to ask.

Development Environment

You are welcome to do your development on any machine you like, but make sure your code can build and run on Linux. Most of you will likely write your code to run on MathLAN, but if you use your own Linux machine please make sure to list any special dependencies I may need to install to run your code. If you are more comfortable working on macOS or Windows Services for Linux you’re welcome to use that. Just be sure to test regularly to make sure your code works on Linux.

Programming Language

You must implement the main executable of your anagram finder in C or C++ (if you know it). That doesn’t mean you can’t write code in other languages as part of your solution if you need to transform or organize a dictionary file or some other kind of pre-processing step, but that code should not need to run on every execution of the anagram finder. I am guessing that most students will not need to do any extra processing work, so most students will write all of their code in C or C++.

Please check with my by email if you are planning to include code in another language and you’re not sure whether it’s allowed.

Inputs

Your anagram finder should accept a single command line argument string that may contain spaces, punctuation, and both uppercase and lowercase letters. Discard any punctuation in the input, including spaces; these should have no effect on the behavior of your program. Treat uppercase and lowercase letters the same. As you can see in the example above, the “MAP” uppercase letters were substituted for all lowercase letters.

You are free to add reasonable restrictions on input (like maximum length), but you must include code in your program to check whether the input violates those restrictions. If your implementation receives an invalid input, you must print an informative error message and exit.

Outputs

The output from your anagram finder should consist entirely of words from a dictionary, with spaces between words. The file /usr/share/dict/words on Linux or macOS is a good choice, but you may use your own dictionary or data files if you prefer. You do not need to add capitalization or any punctuation other than spaces between words.

Make sure your output includes only true anagrams (see the Wikipedia article if you’re not sure what an anagram is). That means each output from your anagram finder should have the same number of occurrences of each letter in the input; for example, anagrams of “hello” should have exactly one “h”, one “e”, one “o”, and two “l”s.

Most non-trivial inputs will have hundreds or thousands of possible anagrams, but you do not need to find them all. I would be happy if your tool prints at least ten unique anagram outputs for an input that has at least that many possible answers, but you’re free to choose which anagrams to print and how to prioritize them. It’s also fine if your implementation prints more anagrams, as long as the output does not include duplicates. The input is technically an anagram of itself, so it’s okay if your anagram finder includes the original input as one of the anagrams it prints.

Code Quality

You are free to use any code formatting style you like, as long as the code is legible and your formatting is consistent. Please make sure to include enough comments that I can quickly determine what each function, loop, and conditional is doing. Please also try to break your code up into functions, especially if you repeat operations throughout your code. I specifically do not want to see implementations with hundreds of lines of code all in the main function. Use your judgement about how to divide up the code, and write detailed comments to make that division of code clear.

In addition to the style of your code, you should also ensure your program builds with no warnings, contains no memory errors, and checks return codes from any built-in functions (you can skip checks for malloc and printf, but all other built-in functions that can return error codes should be checked). Here are some examples of memory errors you should avoid:

  • reading a local variable without initializing it
  • reading from malloced memory without initializing it
  • writing beyond the end of an array
  • reading or writing malloced after it has been freed
  • freeing a pointer from malloc more than once
  • forgetting to free memory allocated with malloc

You may find it useful to build and run your program with AddressSanitizer to check for memory errors. To do this, pass the line -fsanitize=address to your compile command when you build the program. You can use the program as you normally would, but it will run a bit slower because it includes checks for memory accesses. If there are any errors, AddressSanitizer will print an error message that tells you the line of code where the error occurred, along with some other diagnostic information. You can read more about AddressSanitizer on its GitHub page.

Implementation Strategy

There are no strict requirements for how your implementation has to work, other than the rules about inputs and outputs above. Your implementation does not have to be fast, although it’s nice if it is. Your program could return ten interesting anagrams with lots of unique words, the first ten matches it finds, or every anagram it finds. You’re welcome to use any data structures you like, or you can operate directly on the dictionary file you use to find words.

Use your best judgement to select an approach that will work without too much effort on your part. I’d rather see an complete but imperfect implementation than a failed attempt at an overly-complicated approach. If you think of ways to improve your program but do not have a chance to implement them, you’re welcome to describe them in the README file you include with your completed challenge.

Documentation

In addition to your implementation, please include a README file that answers the following questions:

  1. How do I build your anagram finder? Be sure to include exact commands, and any extra dependencies I may need to install to build or run your code.
  2. How do I run your anagram finder? Make sure to cover any restrictions on inputs.
  3. How does your anagram finder work? Give a rough overview of the search process. It may be helpful to reference specific functions or lines in your code.
  4. If your anagram finder does not print all the possible anagrams, how does it decide which ones to print?
  5. Are there any quirks, limitations, future plans, or other details you think I should know about your work?
  6. Cite any outside sources you used to complete your implementation.

Submitting Your Work

Once you’ve finished your implementation, please do one last check for formatting issues, missing comments, and memory errors. Once that’s done, send me a .tar.gz or .zip archive of your work by email with the subject [MAP] Coding Challenge Solution. Include your README, code, data, and any build files (e.g. a Makefile) if you’re using them. You can assume the machine I use to test your code will have a dictionary at /usr/share/dict/words, but if you rely on a different data file please include that as well.

Be sure to send your solution in by 11:59pm (Grinnell time) on Monday, March 8th. That is one week after the application deadline.

Collaboration and Resources

Do not discuss any aspect of this coding challenge with anyone other than me. You may not discuss the details of the challenge, your approach, your code, or any issues you are having with other students, department tutors, class mentors, or other faculty.

If you use any resource other than this document, Linux man pages, or our discussions to complete your work, please include citations in your README file. Please do not use resources that describe how to solve this problem directly, and do not copy–paste large code fragments from outside sources. You’re welcome to read about data structures, search StackOverflow for technical problems you run into, and rely on other general sources as long as you cite them, but do not search for or use resources that describe methods for finding or checking anagrams. If I find that you used inappropriate resources for your work – cited or not – I will remove your application from consideration. If you’re not sure if a resource is appropriate, please ask me before using it.

I am happy to answer questions about the requirements for this challenge at any point. I’d also be happy to talk with you about your approach once you’ve spent some time working on your own. This challenge is meant to be a model of the kind of work I’ll be asking MAP students to complete during the summer, and our discussions will be a big part of the work we do together. We can discuss any aspect of the challenge you would like, but I want to see that you’ve spent sufficient time exploring and testing your solution before we meet. My availability may be a bit limited during the term, but if you send me an email with a short question and a few possible meeting times I will try to get back to you quickly with an answer or a time when we can meet synchronously.