Question
The program

The program you'll be writing for this project is one that can find and display the paths of all of the files in a directory (and, potentially, all of its subdirectories, and their subdirectories, and so on), and then take action on some of those files that have interesting characteristics. Both the notion of interesting characters and taking action will be configurable and each can work in a few different ways, but the core act of finding files will always be the same. One of your goals should be to avoid rewriting the same code over and over again (e.g., multiple functions that each perform a search for files with slightly different characteristics) whenever possible; in ICS 32, we begin to concern ourselves more strictly with design issues, keeping an eye on how best to solve a program, as opposed to writing something that "just works."

The input

Your program will take input from the console in the following format. It should not prompt the user in any way. The intent here is not to write a user-friendly user interface; what you're actually doing is building a program that we can test automatically, so it's vital that your program reads inputs and writes outputs precisely as specified below.

First, the program reads a line of input that specifies which files are eligible to be found. That can be specified in one of two ways:
The letter D, followed by a space, followed (on the rest of the line) by the path to a directory. In this case, all of the files in that directory will be under consideration, but no subdirectories (and no files in those subdirectories) will be. (You can think of the letter D here as standing for "directory.")
The letter R, followed by a space, followed (on the rest of the line) by the path to a directory. In this case, all of the files in that directory will be under consideration, along with all of the files in its subdirectories, all of the files in their subdirectories, and so on. (You can think of the letter R here as standing for "recursive.")
If this line of input does not follow this format, or if the directory specified does not exist, print the word ERROR on a line by itself and repeat reading this line of input; continue until the input is valid.
Next, the program prints the paths to every file that is under consideration. Each path is printed on its own line, with no whitespace preceding or following it, and with every line ending in a newline. Note, also, that the order in which the files' paths are printed is relevant; you must print them in the following order:
First, the paths to all of the files in the directory are printed. These are printed in lexicographical order of the file's names. (More on lexicographical order a bit later, but note that this is the default way that strings are sorted.)
Next, if the files in subdirectories are being considered, the files in each of the subdirectories are printed according to the same ordering rules here, with all of the files in one subdirectory printed before any of the others, and with the subdirectories printed in lexicographical order of their names.
Now that the program has displayed the paths of every file under consideration, it's time to narrow our search. The program now reads a line of input that describes the search characteristics that will be used to decide whether files are "interesting" and should have action taken on them. There are five different characteristics, and this line of input chooses one of them.
If this line of input is the letter A alone on a line, all of the files found in the previous step are considered interesting.
If this line of input begins with the letter N, the search will be for files whose names exactly match a particular name. The N will be followed by a space; after the space, the rest of the line will indicate the name of the files to be searched for.
Note that filenames include extensions, so a search for boo would not find a file named boo.doc.
If this line of input begins with the letter E, the search will be for files whose names have a particular extension. The E will be followed by a space; after the space, the rest of the line will indicate the desired extension.
For example, if the desired extension is py, all files whose names end in .py will be considered interesting. The desired extension may be specified with or without a dot preceding it (e.g., E .py or E py would mean the same thing in the input), and your search should behave the same either way.
Note, also, that there is a difference between what you might call a name ending and an extension. In our program, if the search is looking for files with the extension oc, a file named iliveinthe.oc would be found, but a file named invoice.doc would not.
If this line of input begins with the letter T, the search will be for text files that contain the given text. The T will be followed by a space; after the space, the rest of the line will indicate the text that the file should contain in order to be considered interesting.
For example, if this line of input reads T while True, any text file containing the text "while True" would be considered interesting.
One thing to note is that not all files are text files, but that you can't determine that by their name or their extension. Any file that can be opened and read as a text file is considered a text file for our purposes here, regardless of its name. Any file that cannot be opened and read as a text file should be skipped (i.e., it is not considered interesting).
If this line of input begins with the character <, the search will be for files whose size, measured in bytes, is less than a specified threshold. The < will be followed by a space; after the space, the rest of the line will be a non-negative integer value specifying the size threshold.
For example, the input < 65536 means that files whose sizes are no more than 65,535 bytes (i.e., less than 65,536 bytes) will be considered interesting.
If this line of input begins with the character >, the search will be for files whose size, measured in bytes, is greater than a specified threshold. The > will be followed by a space; after the space, the rest of the line will be a non-negative integer value specifying the size threshold.
For example, the input > 2097151 means that files whose sizes are at least 2,097,152 bytes (i.e., greater than 2,097,151 bytes) will be considered interesting.
If this line of input does not match one of the formats described above, print the word ERROR on a line by itself and repeat reading a line of input; continue until the input is valid. Note that it is not an error to specify a search characteristic that matches no files; it's only an error if this line of input is structurally invalid (i.e., it does not match one of the formats above).
Next, the program prints the paths to every file that is considered interesting, based on the search characteristic. Each path is printed on its own line, with no whitespace preceding or following it, and with every line ending in a newline. The paths should be printed using the same ordering rules as the last time you printed them (i.e., lexicographical ordering, as described above), though, of course, you will likely print fewer this time, since not every file will necessarily meet the search characteristic.
If there were no interesting files, the program ends; there is no action to take.
Now that we've narrowed down our search, it's time to take action on the files we found. The actions are to be taken on the files in the same order as you printed them previously. The program now reads a line of input that describes the action that will be taken on each interesting file. There are three different actions, and this line of input chooses one of them.
If this line of input contains the letter F by itself, print the first line of text from the file if it's a text file; print NOT TEXT if it is not.
If this line of input contains the letter D by itself, make a duplicate copy of the file and store it in the same directory where the original resides, but the copy should have .dup (short for "duplicate") appended to its filename. For example, if the interesting file is C:\pictures\boo.jpg, you would copy it to C:\pictures\boo.jpg.dup.
If this line of input contains the letter T by itself, "touch" the file, which means to modify its last modified timestamp to be the current date/time.
If this line of input does not match one of the formats described above, print the word ERROR on a line by itself and repeat reading a line of input; continue until the input is valid.
Once an action has been taken on all files, the program ends.
A few additional notes

Since one of the goals of this project is to introduce you to the use of recursion to solve real problems, the search that includes subdirectories and their subdirectories must be implemented as a recursive Python function that processes all of the files in a directory and then processes any subdirectories recursively. (Note that this rules out certain features of the Python standard library — searches like this are actually built into the library, but would circumvent the learning goal here. More on that later.)

Outside of the occurrence of symbolic links, which we're ignoring for the purposes of this project, directory structures are hierarchical (i.e., directories have subdirectories inside of them, and those subdirectories have the same basic structure as their "parents").

An example of the program's execution

The following is an example of the program's execution, as it should work when you're done. Boldfaced, italicized text indicates input, while normal text indicates output. The directories and files shown are hypothetical, but the structure of the input and output is demonstrated as described above.

To reiterate a point from earlier, your program should not prompt the user in any way; it should read input, assuming that the user is aware of the proper format to use.

R C:\Test\Project1\Example
C:\Test\Project1\Example\test1.txt
C:\Test\Project1\Example\test2.txt
C:\Test\Project1\Example\Sub\meee.txt
C:\Test\Project1\Example\Sub\test1.txt
C:\Test\Project1\Example\Sub\youu.txt
C:\Test\Project1\Example\Zzz\zzz.py
N
ERROR
N test1.txt
C:\Test\Project1\Example\test1.txt
C:\Test\Project1\Example\Sub\test1.txt
Q
ERROR
F
This is a line of text
Hello, my name is Boo
Portability

It should be possible to run your program on any operating system — Windows, macOS, Linux, etc. — so long as there is a Python 3.8 implementation for it. By and large, this doesn't require much special handling: Python is already reasonably independent of its platform. However, since you're dealing with a filesystem, you do have to be cognizant of how filesystems are different from one operating system to another. Most notably, paths are written differently:

On Windows, paths tend to be a drive letter, followed by a colon, followed by a sequence of directory names separated by backslashes, such as D:\Docs\UCI\ICS32\Homework.
On macOS, Linux, and other flavors of Unix, paths tend to be a sequence of directory names preceded by forward slashes instead, such as /home/student/uci/ics32/homework.
The way to isolate this distinction is to find tools in Python's standard library that isolate them for you. Don't store paths as strings; store them as path objects instead. (More on this below.)

Handling failure

You'll want to handle failure carefully in this program. In general, your program should not crash just because one activity fails; it should instead quietly skip the offending file or directory and continue, if possible. For example:

If, during the search, accessing some directory fails, the search should still continue attempting to access other directories.
If taking action on an interesting file fails, the program should continue processing additional interesting files. The only exception to this rule is when printing the first line of text from a file, in which case you would print NOT TEXT if the file is not a text file.
Organization of your program

There are a few things to note about how your program is organized.

Your program must be written entirely in a single Python module, in a file named project1.py. Note that capitalization and spacing (i.e., no letters in the filename are capitalized and there are no spaces) are important here; they're part of the requirement.
Executing your project1 module — by, for example, pressing F5 or selecting Run Module from the Run menu in IDLE — must cause your program to read its input and then print its output. It must not be necessary to call a function manually to make your program run.
Importing your module manually using an import statement should not cause your program to run. The way to differentiate between these scenarios is to write an if statement, outside of any function (and usually at the bottom of your module), that looks like this:
if __name__ == '__main__':
    the_things_you_want_to_happen_only_when_the_module_is_run
The expression __name__ == '__main__' will only return True within a module that was originally executed using F5 in IDLE; it will return False in any other module (e.g., in a module that has been imported instead).

A word about documentation

As you likely did in ICS 31, you are required to include type annotations and docstrings on every one of your functions. This will not only make your design clear to us when we're grading your project, but will also clarify your own design for yourself; details like this will help you to read and understand your own program. I find, as a general rule, that I will always write (or, at the very least, consider and understand) what the types my parameters and return value will be before I write any function; knowing the types is part of knowing what the function is supposed to do.

For example, if you were to write a function find_max that finds the maximum integer in a list of integers, you might start the function this way:

def find_max(numbers: [int]) -> int:
    '''Finds and returns the maximum number in a list of numbers'''
    ...
Other aspects of how this project (and others) will be graded are described on the front page of the Project Guide. Keep in mind, in particular, that part of your score will be based on how well your program is designed; the front page of the Project Guide describes this in more detail.

An important design goal

As you write your program, you'll want to be on the lookout for complex functions that can be made simpler by breaking them up into multiple smaller functions. Taking complexity and putting it into a function with a well-chosen name and clearly-named parameters is a great way to tame that complexity; this is the first step in building an ability to write significantly larger programs than you've written so far, so be sure that you're getting some practice in taking that step in this project. One of the criteria we'll use in grading your project is to assess the quality of its design; in this project, the largest factor affecting quality is the extent to which functions were broken up when they are long or cumbersome.

A good rule of thumb is to consider what you would say if I asked you "What does this function do?" If the answer is more than a sentence or so — and probably a short one, at that! — it's probably doing too much, and might better be implemented as multiple functions that each do part of the job. (One of the good things about writing docstrings in your functions is that it forces you to test this; if your docstring needs to be especially long, your function is probably too complex.)

Wait... what kind of crazy user interface is this?

Unlike programs you may have written in the past, this program has no graphical user interface, or even an attempt at a "user-friendly" console interface. There are no prompts asking for particular input; we just assume that the program's user knows precisely what to do. It's a fair question to wonder why there isn't any kind of user interface. Not all programs require the user-friendly interfaces that are important in application software like Microsoft Word or iTunes, for the simple reason that humans aren't the primary users of all software. As we'll see going forward in this course, much of what happens in software is programs talking directly to other programs; in cases like that, each program generally presumes that the other one is aware of the right way to communicate, and will give up quickly if the other program isn't ready to play by those rules.

Now consider again the requirements of the program you're being asked to write for this project. It waits for requests to be sent in via the console — though they could almost as easily be sent across the Internet, if we preferred, which we'll see in the next project — in a predefined format, then responds using another predefined format. Our program, in essence, can be thought of in the same way as a web server; it's the engine on top of which lots of other interesting software could be built. When an attempt in being made to search for a file, that could be coming from a web form filled out by a user, from another program that needs to perform a search, or from a human user in the Python shell.

While we won't be building these other interesting parts, suffice it to say that there's a place for software with no meaningful user interface; it can serve as the foundation upon which other software can be built. You can think of your program as that foundation.

Additionally, we're using a strategy like this one to assist us in automating the grading of the correctness of your project. Since everyone's input and output have to be formatted in the same way, we will be able to grade your output without manually looking at every line.
Solution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references.
Students may use these solutions for personal skill-building and practice.
Unethical use is strictly forbidden.

import os
import shutil

# Method to print all files in a list of files
def printFiles(files):
    for file in files:
       print(file)

# Method to retrieve all files in a directory, with recursive option
def getFiles(directory, recursive=False):
    files = []
    for dirname, dirnames, filenames in os.walk(directory):
       if (recursive == False and dirname == directory) or recursive == True:      
            for filename in filenames:
                files.append(os.path.abspath(os.path.join(dirname, filename)))
    return files

# Method to filter list of files by specified parameter
# N - filter files by file name
# E - filter files by file type
# T - filter files by text contents
# < - filter files by upper limit file size in bytes
# > - filter files by lower limit file size in bytes
def filterFiles(files, cmd, param):
    result = []
    for file in files:
       if cmd == "N" and param == os.path.basename(file):
            result.append(file)
       elif cmd == "E" and param == file.split(".")[-1]:
            result.append(file)
       elif cmd == "T":
            try:
                text = "".join(open(file).readlines())
                if param in text:
                   result.append(file)
            except:
                pass
       elif cmd == "<":
            try:
                size = int(param)
                if os.path.getsize(file) < size:
This is only a preview of the solution.
Please use the purchase button to see the entire solution.
By purchasing this solution you'll be able to access the following files:
Solution.zip
Purchase Solution
$60.00 $30
Google Pay
Amazon
Paypal
Mastercard
Visacard
Discover
Amex
View Available Computer Science Tutors 641 tutors matched
Ionut
(ionut)
Master of Computer Science
Hi! MSc Applied Informatics & Computer Science Engineer. Practical experience in many CS & IT branches.Research work & homework
5/5 (6,804+ sessions)
1 hour avg response
$15-$50 hourly rate
Pranay
(math1983)
Doctor of Philosophy (PhD)
Ph.D. in mathematics and working as an Assistant Professor in University. I can provide help in mathematics, statistics and allied areas.
4.6/5 (6,688+ sessions)
1 hour avg response
$40-$50 hourly rate
Leo
(Leo)
Doctor of Philosophy (PhD)
Hi! I have been a professor in New York and taught in a math department and in an applied math department.
4.9/5 (6,435+ sessions)
2 hours avg response

Similar Homework Solutions