Spent an hour figuring out a few regexes for the BNC corpus' XML files. This is turning out to be a very complicated essay. Eyes glazed over many, many times.
As awesome as CLI tools are, BBEdit still has it for experimenting: test the regexes there first (the "find all" window is really useful), then stick it into a script and let it hack away at 4GB of plain text.