-
-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
How could the content be improved?
The lesson's introduced, conceptually, as a realistic research project analysing data files. However it then almost immediately pivots into doing fairly abstract and arbitrary work on thesis.txt, extracts of Little Women, random gene sequences of fictional creatures... these files are then scattered in a bunch of subdirectories. The lesson makes very little use of the actual data.
The exercises are also quite abstract, and heavily focus on multiple choices based on "Look at this example directory tree" - not making use of the actual directory trees in the data we have them download.
I think it'd flow a lot better if:
- Basic shell scripts were introduced very early - possibly straight after basics like using wildcards on the command line.
- Then, a lot of the multiple choice exercises could be replaced with 'write a shell script that...' which used the actual data directories in the material - so people can poke around and explore to find the answer if they don't know.
- Tools were then introduced with use cases for the actual data - e.g.
- Using 'find' to get a subset of files
- Using
grepto extract a particular ID/date/time of record from that file - Using
cutto select a particular column - Using loops to repeat this for a particular set of parameters
- Using shell script inputs to allow the user to specify the column
There's a lot of use of wc, sort, head -n and tail -n but I don't think they're that likely to be part of real pipelines. If selecting specific lines is required then sed -n is the realistic option, whilst head and tail should be introduced for their typical uses of peeking at files.