Tuesday, July 22, 2014

Django QuerySets are lazy

https://docs.djangoproject.com/en/dev/topics/db/queries/#querysets-are-lazy
I've known this for quite some time, but never had the chance to really use this, until recently I was going over some old code.  Someone has asked me to implement a search feature where objects are being searched on by some attributes.  This was still when I was just starting out in python so the search was implemented poorly a lot of filtering was done in the app itself and filter building was really awkward and QuerySets were being evaluated for intermediate results.  The logic basically consisted of a loop that would take a filter condition and create a query set, then combine resulting query sets together and return them.

This worked ok, until recently someone pointed out that the proper way would be to apply the AND condition to all the filters or at least provide an option to do so.

Knowing that QuerySets are lazy I knew it was safe to come up with something like this

reduce(lambda acc, x: acc.filter(criteria=x), xs, myobj.objects.all())

This would repeatedly apply the filter and also allow me to construct filters on the fly.

Oh but you might say what if you are not searching by field called criteria.  Well you can actually define the field and the value you are searching for, however rather then me copying and pasting someone else's work, just read this for further explanation: http://www.nomadjourney.com/2009/04/dynamic-django-queries-with-kwargs/

My whole point is that you can apply successive filters without any time penalty.  Not a single db query will be sent until you evaluate the query set.

Friday, February 28, 2014

on logging

I like to log, I log in as much details as I can, my application produce massive logs daily and still I am not happy and keep tweaking my logs -- another variable into output here, this needs to be logged as well, etc.

If you have a distributed application with function calls going up and down the stack and forwarding to other components on the network, you need to have a unifying identifier otherwise correlating and interpreting logs will be very difficult.

How to tie logs together
How to pick an identifier and set it.  Lets say you have some application be it a web app or some network app and you want to be able to trace function calls and logs and connect them together.  At the first entry point pick an identifier, if no meaningful identifier that pertains to the application exists (such as DHCP-Transaction-Id) just pick a random one.  I find that a uuid works really well in this case.  At this point you have established an identifier, but how will you let everyone know what it is?  In case of synchronous frameworks such as django you may be able to just use some global variable which everyone can refer to.  However in asynchronous frameworks such as twisted, you have no other choice but to pass the id around from a function to a function.  So just go ahead and make sure all your functions have a req_id field.  It also helps to have a uniform logging function to wrap around your log facility, something similar to

func log(logger, req_id, status, msg, extra):
    ...

extra is for things that did not fit that category, a variable or a dict that I will make up on the spot when I want to print multiple things.  Call str on it or use pprint to format it

What else to put there
However in addition to printing that it is a good idea to also let the person who is reading logs know where this log came from, such as module, function and line number.  And again in python's logging a formatter may be configured to do all that, however the formatter will report the function and line number as being the log function, so I usually replace that with inspect

func_name = inspect.stack()[1][3]
line_no = inspect.stack()[1][2]

Where to print
EVERYWHERE!  I usually print when entering a functions and existing, when entering I print the interesting parameters and when exiting I print what it is returning.  Every time a function does something such as find available IP in the subnet, I log, ping that IP to see if it is free, I log, create a record with the IP, log, creation passed or failed log.  Any other action such as database calls always log results.  One day you will want to know why your app did something and you will wish you had those logs.

Log confidentiality
Logs usually go on local disk and then may get moved to some cheap storage, most logs are not confidential.  If your application accepts passwords and you must log user input for certain things, I suggest replacing the password wiht **** or ####, by no means do a 1:1 replacement, you do not want to give a hint as to the length.  This way your logs do not have to contain confidential information

Logs without context are meaningless
Logs need to provide context, such as:
- I am in the process of finding an IP for as:bb:cc:dd:ee.  
- Sent and offer with IP of 192.168.1.23 to as:bb:cc:did:ee. 
- Received a NAC from as:bb:cc:did:ee.
- about to write /etc/cron.d/job with artifact ID crown.fabdeed12da