Log Files

Introduction

Even small web sites such as mine produce a great deal of information in the form of log files. If left unattended and forgotten about these files can get ginormous. What on earth can you do with a near 10Gb text file? In Windows, natively, barely anything.

directory listing of large log files

Luckily there are tools and techniques around that can manupulate these large files and so this page was written...

Notepad

So long as nothing else is using a lot of memory, Notepad can easily deal with the 347.8Mb file. If there is not much memory available for Notepad it can take a while for it to load the file. It cannot open the 1,6Gb file. It gives the message "File is too large for Notepad. Use another editor to edit the file."

Use a Browser

If all that is needed is to view the file, to peek inside it, then these files can be opened in a browser. Chrome, Edge, and Opera can open the near 10Gb file. Firefox and Internet Explorer cannot.

Log Parser

Log Parser is a program created by Microsoft specifically to query log files. Log Parser is a comman line utility but there is a GUI for it, Log Parser Studio.

Here are some very simple example queries. In these examples "-i:ncsa" specifies the Common Log Format used by web servers.

Show all records from a file:

logparser -i:ncsa "SELECT * FROM 'E:\web server logs\originals\brisray.com-access-20210704.log'"

Show first 10 records from a file:

logparser -i:ncsa "SELECT TOP 10 * FROM 'E:\web server logs\originals\brisray.com-access-20210704.log'"

Write the records between two dates into a newfile:

logparser -i:ncsa "SELECT INTO 'E:\web server logs\newfile.log' FROM 'E:\web server logs\originals\brisray.com-access-20210704.log' WHERE [DateTime] BETWEEN timestamp('2016/01/01', 'yyyy/MM/dd') AND timestamp('2016/02/01', 'yyyy/MM/dd')"

Show last 10 records from a file sorted by date:

logparser -i:ncsa "SELECT TOP 10 * FROM 'E:\web server logs\originals\brisray.com-access-20210704.log' ORDER BY DateTime DESC""

Log Parser pauses the screen output every 10 records with the instruction to "Press a key..." This behaviour can be changeed by using the switch --rpt:<number> where number is the number of records to display before the "Press a key..." prompt is displayed, for example -rtp:20

To suppress the "Press a key..." prompt altogether use -rtp:-1

Mike Lichtenberg has over 50 more complex example queries on LichtenBytes.

PowerShell

What I wanted to do to with these large log files was to split them out to new files based on month and year. The following code will accomplish this in PowerShell..


$data = get-content "E:\web server logs\originals\brisray.com-access-20210704.log"

foreach($line in $data)
{
if($line.Contains("/Oct/2018"))
        { 
            $line | out-file -filepath "E:\web server logs\brisray-access-2018-10.log" -Append
        }
if($line.Contains("/Nov/2018"))
        { 
            $line | out-file -filepath "E:\web server logs\brisray-access-2018-11.log" -Append
        }
if($line.Contains("/Dec/2018"))
        { 
            $line | out-file -filepath "E:\web server logs\brisray-access-2018-12.log" -Append
        }
	}		

The PowerShell get-content commandlet is not very fast if the input file is large. A better code would be...


$path = 'E:\web server logs\originals\brisray.com-access-20210704.log'
$r = [IO.File]::OpenText($path)
while ($r.Peek() -ge 0) {
    $line = $r.ReadLine()
    # Process $line here...
if($line.Contains("/Oct/2018"))
        { 
            $line | out-file -filepath "F:\server logs\brisray-access-2018-10.log" -Append
        }
if($line.Contains("/Nov/2018"))
        { 
            $line | out-file -filepath "F:\server logs\brisray-access-2018-11.log" -Append
        }
if($line.Contains("/Dec/2018"))
        { 
            $line | out-file -filepath "F:\server logs\brisray-access-2018-12.log" -Append
        }
}

The date is formatted differently in Apache error logs. For those the comparison I used was:

if($line -like "* Nov* 2018*")

Powershell documentation

PowerShell Encoding

Be very careful of PowerShell file encoding!

In PowerShell I used [System.Text.Encoding]::Default to find the system encoding. That said it was Windows-1252 which is a subset of UTF-8. After processing 25Gb of old log files into month/year files, the old web log analyzer Analog CE could not read a single line of them!

When saving files PowerShell is version dependant. On Windows 10, I typed $PSVersionTable in PowerShell and that gave me the version of 5.1.19041.1023. Using $OutputEncoding the default file output is supposed to be 7-bit US-ASCII, codepage 20127. Everything seemed fine but Analog could not read the files!

Analog unable to read log files

The text files themselves looked fine in Notepad but there was obviously wrong with them as Analog could easily read the original files. I opened the files in Notepad++. Using that I checked the line ending code (Edit > EOL Conversion) and the encoding type (Encoding). The original files from Apache use the Unix type LF line endings are are UTF-8 encoded. The files I processed in PowerShell had CRLF Windows line endings and were UTF-16 LE BOM encoded.

Using Notepad++ to change the line endings and encoding I found that Analog CE does not care whether the line endings are LF or CRLF and can read ANSI, UTF-8 and UTF-8 BOM (byte order mark) encoded files. It cannot read UTF-16 encoded files at all.

In PowerShell, to avoid this problem the -Encoding switch can be used with out-file to force the output file encoding (See out-file documentation).

But what to do with the 557 files taking up 25Gb that I'd already messed up? Luckily there's a PowerShell command to change the encoding:

get-item 'D:\web server logs backups\*.*' | foreach-object {get-content $_ | out-file ("D:\web server logs backups\new\" + $_.Name) -encoding utf8}

After running the command, the files are all readable by Analog again:

Analog readable log files
This page created July 8, 2021; last modified July 31, 2021