Doing things the un*x way means keeping a supply of “go-to” tools for the various tasks that spring up during development. For me, that’s a lot of bash, python and Perl on the dev machine and recently node.js on the server.
scraping acquiring multi-lingual (Unicode) data for a project, I had to make sure I kept the correct utf-8 encoding all along the processing pipeline. One screw-up and it’s garbage out. Here are some tips: