I have this awful habit of testing concepts in poorly designed code with no logging, alerting, or comments, then somehow, they end up in production. Not real production, but it fills a need, saves time and effort, and several others want to use it, so it’s available for anyone who needs it. Then it breaks – why?? No idea, there are no logs. I don’t even always know when it broke.
Then the inevitable comes along – who is using it, how often, and how much time will it take to fix it? Yes. Best answer I have, because I know it’s being used, pretty often, and depends on why it broke.
This is bad, more specifically, it’s my bad.
Now we’re R&D here at CofenseLabs, we’re not supposed to be writing production code, we are the music makers, and we are the dreamers of dreams; we convert napkin doodles into radish dust prototypes fueled by coffee and forgotten lunch breaks. Planning ahead does not come easily when testing the unknowns.
That being said, the pattern I identified above persists, and I am going to address it now. I need a standardized environment, I need logging, I need monitoring, and I need alerting.
We decided a long time ago that we would be using py3 (After I wrote a proof of concept in 2.7 that accidently became production). Probably a good time for me to use python’s virtual environments. Never used them before, why now? For those of you who know, skip ahead. For those of you who don’t know, when I am trying to generate my requirements.txt using pip3 freeze, I currently get 113 lines of packages that probably don’t need to be installed somewhere to deploy one project on an AWS instance. I could pretty much just grep all my imports and make a list myself, but then I’d have no version controls without using grep from the pip freeze… really just making a lot more work for myself. So yeah, good practice to use python virtual environments from the start.
Import logging, right? That basically does nothing. Even after importing a base config, there’s no file output, it doesn’t have what I want in it even if there was, so I need to format it, and if there’s more than one .py, I need to logging.getLogger all the time. Nope, here’s a nice gem for everyone: https://pypi.org/project/loguru/
It is absolutely beautiful. Colors, default log line formatting that has what I want, easy logfile configuration, and can still take standard python log handlers (more on that in the alerting section). There are even decorators that will handle an entire try catch around a function and print out the juicy bits about what the error actually was. I’m in love.
Well here’s a big goose egg in my history, and not the golden kind either. Monitoring has been anything from looking over the minimal logging to being notified when it’s broken, to sending alerts and error messages over a slack webhook. Nope, new year, new me (what an awful year, something ought to improve, right?). statsd. I’m new to it, so bear with me. There are probably hundreds of walkthroughs on configurations and setups, but the idea is statsd -> graphite -> Grafana. Grafana has all the pretty charts and dashboards. Grafana can even be configured to send out alerts based on thresholds of being too busy or being silent on the wire. External monitoring of a project? Sign me up.
Alright, so if I use Grafana I can set up alerting, BUT, what if I want to know when a particular event occurs, but on Slack, which goes to my phone, but can also alert my team just in case it’s critical-ish and I’m out for the day. https://pypi.org/project/slack-logger/ is probably my favorite right now. It’s pretty, it works, and since it is a logging handler (I said I’d get back to this), I can add this as a handler to loguru and set the log level on it individually.
Generally, I’m one to create one giant main function to prove something works before fussing over helper functions, modularity, or even speed – I just need to see that it works. Once it does work, then modularity and speed come in to play. I’m not going to change this. When it comes to modularity, I think I do pretty well. Break everything down into little functions, connect them together in a logical way, and leave the option open to replace any logical segment with one of similar function without having to make full code revisions on connected components. Where I’m really going to shift my focus here is making sure that when I want something to be fast, that I chose the most appropriate method to do so. I have been known to write optimized algorithms in C and wrap them as python modules, but there are other options, such as the threading and multiprocessing modules, asyncio (currently my favorite), concurrent futures, or some combination of the above. There are many ways to combine multiprocessing with asyncio to really get all of your cores glowing red hot, should the need arise.
The above is taken from https://asyncio.readthedocs.io/en/latest/hello_world.html and then decorated with logging events to show how easy it is now (no statsd yet)
My other not amazing habit is to just run everything in a screen session so that when it breaks eventually, I can just log in, screen -r, and see the error that broke everything. There’s scrollback, it’s quick and easy, and if I lose my ssh session for some reason, the screen remains active. That might be ok for testing, but before I walk away and let it run as a service, it should probably actually run as a service. SysVinit exists, but is older, so does upstart for that matter, but new systems get a new OS, and it will have systemd as its initialization and service manager. It’s not hard, it’s just another step to leave something in a condition which it will restart when it breaks and can easily be turned on and off by my counterparts. Below is the blank template I will be using to set services to run as services.
Just copy this into /etc/systemd/system/ as servicename.service and you can use systemctl [start/stop/enable/disable] servicename to your heart’s desire