Subordinate Python processes use their own instances of me to read logfiles.

Method __init__ Undocumented
Method file Opens a file (possibly a compressed one), returning a file object.
Method makeRecord This is where most of the processing time gets spent.
Method ignoreIPs The supervising process may call this with a list of IP addresses to be summarily rejected as I continue parsing.
Method __call__ No summary

Inherited from KWParse:

Method parseKW Undocumented
def __init__(self, matchers, **kw):
Undocumented
def file(self, filePath):

Opens a file (possibly a compressed one), returning a file object.

def makeRecord(self, line, alreadyParsed=False):

This is where most of the processing time gets spent.

Given one line of a logfile, returns one of the following three types of result:

  • For a bogus or ignored logfile line where there's no IP address parsed or other entries from an IP address that was parsed aren't to be affected, None.
  • For a rejected logfile line, a 2-tuple with (1) a string containing the dotted-quad form of an IP address whose behavior or source caused the line to be rejected from inclusion in logfile analysis, followed by (2) False if we are only interested in ignoring its logfile entries, or True if the IP address's behavior was so egregious as to be blocked from further web access as well as being ignored from logfile analysis.
  • For an accepted logfile line, a 2-tuple containing (1) a datetime object and (2) a dict describing the record for the logfile valid line. The dict contains the following entries:
     ip:     Requestor IP address
     http:   HTTP code
     vhost:  Virtual host requested
     was_rd: TRUE if there was a redirect to this URL
     url:    Requested url
     ref:    Referrer
     ua:     The requestor's User-Agent string.
    

The dict entry 'was_rd' indictates if the vhost listed was the original vhost requested before a redirect. In that case the redirect-destination vhost isn't used, though it may be the same.

If my ignoreSecondary attribute is set and this is a secondary file (css or image), it is ignored with no further checks.

If one or more HTTP codes are supplied in my exclude attribute, then lines with those codes will be ignored.

def ignoreIPs(self, ipList):

The supervising process may call this with a list of IP addresses to be summarily rejected as I continue parsing.

def __call__(self, filePath):

The public interface to parse a logfile. My processes call this via the queue to iterate over misbehaving IP addresses and parsed dt-record combinations. The two types iterated are strings (misbehaving IP addresses) and 2-tuples (datetime, record). Either type may be yielded at any time, and the caller must know what to do with them.

If the logfile does not specify a virtual host in CLF column #2, you can specify a vhost for the entire file on the first line. It can be prefixed with a comment symbol ("#" or ";" if you wish).

API Documentation for logalyzer, generated by pydoctor at 2021-09-18 08:41:09.