I recently launched the sourcecaster with James Baker as a community resource to guide use of the command line to meet common challenges working with digital primary sources.
Commands fall into the following categories:
- casting – changing one type of data to another type (e.g. PDF to TXT for text analysis purposes)
- wrangling – manipulating and navigating data (e.g. remove punctuation, normalize case)
- getting – grabbing data from various locations (e.g. webscraping all relevant images from portions of a website)
- managing – editing and managing your work with data (e.g. save command line history)
Commands are broken down to show what each piece does, so its easier to adapt to your purposes. For more descriptive breakdowns, explore with explainshell.
The hope is that this resource will continue to grow. Feel free to contribute your solutions by emailing me directly or starting an issue on github.
This work is adapted from the awesome folks behind ffmprovisr.