logalyzer.logread.ProcessReader(KWParse)
class documentation
Part of logalyzer.logread
(View In Hierarchy)
Subordinate Python processes use their own instances of me to read logfiles.
Method | __init__ | Undocumented |
Method | file | Opens a file (possibly a compressed one), returning a file object. |
Method | makeRecord | This is where most of the processing time gets spent. |
Method | ignoreIPs | The supervising process may call this with a list of IP addresses to be summarily rejected as I continue parsing. |
Method | __call__ | No summary |
Inherited from KWParse:
Method | parseKW | Undocumented |
This is where most of the processing time gets spent.
Given one line of a logfile, returns one of the following three types of result:
-
For a bogus or ignored logfile line where there's no IP address parsed
or other entries from an IP address that was parsed aren't to be
affected,
None
. -
For a rejected logfile line, a 2-tuple with (1) a string containing the
dotted-quad form of an IP address whose behavior or source caused the
line to be rejected from inclusion in logfile analysis, followed by (2)
False
if we are only interested in ignoring its logfile entries, orTrue
if the IP address's behavior was so egregious as to be blocked from further web access as well as being ignored from logfile analysis. -
For an accepted logfile line, a 2-tuple containing (1) a datetime
object and (2) a dict describing the record for the logfile valid line.
The dict contains the following entries:
ip: Requestor IP address http: HTTP code vhost: Virtual host requested was_rd: TRUE if there was a redirect to this URL url: Requested url ref: Referrer ua: The requestor's User-Agent string.
The dict entry 'was_rd' indictates if the vhost listed was the original vhost requested before a redirect. In that case the redirect-destination vhost isn't used, though it may be the same.
If my ignoreSecondary attribute is set and this is a secondary file (css or image), it is ignored with no further checks.
If one or more HTTP codes are supplied in my exclude attribute, then lines with those codes will be ignored.
The supervising process may call this with a list of IP addresses to be summarily rejected as I continue parsing.
The public interface to parse a logfile. My processes call this via the queue to iterate over misbehaving IP addresses and parsed dt-record combinations. The two types iterated are strings (misbehaving IP addresses) and 2-tuples (datetime, record). Either type may be yielded at any time, and the caller must know what to do with them.
If the logfile does not specify a virtual host in CLF column #2, you can specify a vhost for the entire file on the first line. It can be prefixed with a comment symbol ("#" or ";" if you wish).