- Accepted and presented at the:
- 20th Annual Southwest Business Symposium
- Edmond, Oklahoma
- April 17-18, 2003
Overview of Web Traffic Analysis Software
William Rosener
Management Information Systems
Northeastern State University
ABSTRACT
This paper examines how web traffic analysis software
can be used to help understand who is visiting a website,
where they came from, and what types of information
they requested.
This software can also be used to help model behavior
data about visitors such as when visitors are leaving a
site or their buying patterns. This information can be
used to internally justify a company's investment in the
web, help increase their return on their investment, or
help them better manage their web site. Altogether this
software, allows conclusions to be drawn from the
volume of individual requests made to a server.
INTRODUCTION
As web sites become an integral part of an
organization's operations and external communications,
it is increasingly becoming more important to
understand, model, and utilize web site traffic and
visitors' online behavior effectively. By examining web
log files, it is possible to determine not only what, when,
and by whom documents are being viewed, but log files
can also provide information regarding server load,
unsuccessful requests, and valuable marketing
information. An analysis of log files can help an
organization better understand their visitors and the
actions they take on a web site. This information can be
used to internally justify a company's investment in the
web, help increase their return on their investment, or
help them manage their web site.
In addition to knowing what you can learn from a log
file, it is equally important to understand what you can't
learn from a log file. In particular what types of data are
not captured in log files, what types of data are
inherently incomplete, and what types of incorrect
inferences can be made from log files.
WHAT IS RECORDED IN A LOG FILE
Every request from a client browser is recorded in the
server's log files. For a busy server this can result in
hundreds or thousands of entries being recorded per
hour. Depending on the server and how it is configured,
the following information is typically recorded.
- Address of the computer requesting the file
- Date and time the request was made
- URL of the file requested
- Protocol used to request the file
- Size of the file requested
- Referring URL
- Type of browser making the request
- Operating system used by the requesting computer
Below is an actual example of a log file entry.
T59982.nsuok.edu - - [13/Jan/2003:13:39:12 -0500]
"GET /athletics/ HTTP/1.1" 200 9980 "http://www.nsuok.edu"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
| |
Information |
|
What it stands for |
| |
T59982.nsuok.edu |
|
Address of the computer requesting the file |
| |
[13/Jan/2003:13:39:12 -0500] |
|
Date and time the request was made |
| |
/schedules/openspring.html |
|
URL of the file requested |
| |
HTTP/1.1" |
|
Protocol used to request the file |
| |
200 |
|
successful GET |
| |
9980 |
|
Number of bytes |
| |
http://www.nsuok.edu |
|
Referring URL |
| |
Mozilla/4.0 |
|
Type of browser |
| |
Windows 98 |
|
Operating system |
USAGE STATISTICS THAT CAN BE
DETERMINED
The data contained in a log file can be analyzed in
various ways. This information can provide the
following statistics.
- Number of requests (hits)
- Total number of files served
- Total number of kilobytes downloaded
- Types of files downloaded (HTML, GIF, JPEG)
- Total number of times a file was requested
- Unique number of IP addresses requesting files
- Breakdown of domains requesting files
- Status of each request (successful, failed, or redirected)
- Totals and the averages for specific time periods
(hour, day, week, or year)
- Browser version making the request
- Referring page – how did the user reach this page
INFERENCES THAT CAN BE MADE
Advanced Web traffic analysis software can even
provide behavioral data about
visitors. By taking a closer examination of log files this
software can help: 1) identify when visitors are leaving
your website, 2) understand visitors buying patterns and
content interests, 3) sort visitors by demographics and
browsing behaviors, 4) quantify the mix of visitors
including the number of new, repeat, and unique
visitors, and 5) help companies optimize their marketing
dollars. Listed below are other statistics that can be
determined.
- Top paths through the site
- Single access pages
- Top exit pages
- Top entry pages
- Most active organizations
USAGE STATISTICS THAT CAN NOT BE
DETERMINED
While many statistics can be compiled by examining log
files, there are still some types of data and inferences
that can not be derived from log files.
- Most individual identities such as a persons age or
gender are not recorded. While it is possible to
capture a user's name and e-mail address, this
information is typically not recorded since there is
no way to verify the accuracy of this information.
This is because the IP address recorded does not
necessarily correspond to a person. It could be an
Internet Service Provider where multiple users are
all represented by a single IP address, or a spider
retrieving documents for a search engine.
- Where the user went next is not recorded. The
only way to determine where the user went next
would be to examine the log files of the next site the
user visited.
- The reasons requests are made is not recorded.
The motivations for a user visiting a site, how the
user felt about a site, and how files viewed were
used are not recorded.
Today many large scale caches are used to help reduce
the response and download times. This implies that if
the browser finds the file at any intermediary cache, then
the request will not be recorded in the server where the
original document is located. Similarly, if a site is
mirrored, then the log files from all sites must be added
together.
WEB TRAFFIC ANALYSIS SOFTWARE
There are numerous applications available to help
analyze log files. Below is a partial listing of web traffic
analysis software. A more complete listing of log
analysis software can be viewed at HREF 5. These
software packages are very competitive priced. For
example, a single-domain license of WUSAGE 8.0 cost
$75. A web-hosting license of WUSAGE 8.0 which
reports on unlimited virtual domains, located at a single
physical location costs $295.
- NetTracker
-
(http://www.sane.com/products/NetTracker)
- WUSAGE
- (http://boutell.com/wusage)
- WebTrends
- (http://www.webtrends.com)
- Webalizer
-
(http://mrunix.net/webalizer)
EXAMPLES OF WEB TRAFFIC ANALYSIS
SOFWARE
Below are some actual examples of usage statistics. The
charts and tables were
created using WUSAGE 8.
Top 10 Browsers (User Agents).
Sorted by Access Count
| Rank |
Product |
% |
| 1 |
Microsoft Internet Explorer 6.0
|
49.12 |
| 2 |
Microsoft Internet Explorer 5.0
|
38.05 |
| 3 |
Netscape 4.0
|
7.78 |
| 4 |
Netscape 5.0
|
1.86 |
| 5 |
Microsoft Internet Explorer 4.0
|
1.00 |
| 6 |
Netscape 3.0
|
0.49 |
| 7 |
Googlebot/2.1
|
0.35 |
| 8 |
MSProxy/2.0
|
0.17 |
| 9 |
Wget/1.8.1
|
0.16 |
| 10 |
Microsoft URL Control - 6.00.8862
|
0.09 |
Screen Depth (Number of Colors).
Note: computers reporting 16 colors may in some cases
be grayscale devices. Most 16-color computers are
Windows machines temporarily running in safe mode.
| # |
Color Depth |
Computers |
% |
| 1 |
Black and White (1 bit) |
31,505 |
0.12 |
| 2 |
4 Gray Shades (2 bit) |
73 |
0.00 |
| 3 |
16 Colors (4 bit) |
22,685 |
0.09 |
| 4 |
256 Colors (8 bit) |
907,830 |
3.51 |
| 5 |
65,536 Colors (16 bit) |
12,353,149 |
47.80 |
| 6 |
Millions of Colors (24 bit) |
3,354,058 |
12.98 |
| 7 |
Millions of Colors (32 bit) |
9,171,476 |
35.49 |
Top 10 Visitor Domains.
Sorted by Access Count
| Rank |
Domain |
Accesses |
% |
Bytes |
% |
Visits |
% |
| 1 |
edu
|
256,979 |
49.11 |
2,977,112,827 |
49.85 |
61,630 |
44.41 |
| 2 |
net
|
110,722 |
21.16 |
1,343,720,945 |
22.50 |
38,731 |
27.91 |
| 3 |
unknown
|
72,732 |
13.90 |
655,430,928 |
10.98 |
19,228 |
13.85 |
| 4 |
com
|
72,043 |
13.77 |
864,868,021 |
14.48 |
15,319 |
11.04 |
| 5 |
gov
|
3,110 |
0.59 |
38,923,391 |
0.65 |
1,216 |
0.88 |
| 6 |
n_america
|
1,650 |
0.32 |
21,540,140 |
0.36 |
679 |
0.49 |
| 7 |
asia
|
1,550 |
0.30 |
13,199,266 |
0.22 |
344 |
0.25 |
| 8 |
mil
|
1,457 |
0.28 |
20,541,590 |
0.34 |
487 |
0.35 |
| 9 |
org
|
1,165 |
0.22 |
11,433,053 |
0.19 |
489 |
0.35 |
| 10 |
europe
|
1,020 |
0.19 |
13,184,515 |
0.22 |
286 |
0.21 |
Top 10 Search Keywords.
Sorted by Access Count
Keywords used to reach this site via search engines,
such as Altavista and Infoseek.
| Rank |
Search Keyword(s) |
Accesses |
% |
| 1 |
Northeastern State University
|
4,416 |
7.50 |
| 2 |
Northeastern
|
2,371 |
4.03 |
| 3 |
State
|
1,814 |
3.08 |
| 4 |
nsuok
|
1,416 |
2.41 |
| 5 |
Academic schedules
|
1,034 |
1.76 |
| 6 |
Tahlequah
|
842 |
1.43 |
| 7 |
Broken Arrow
|
798 |
1.36 |
| 8 |
training evaluation
|
768 |
1.30 |
| 9 |
shareware
|
713 |
1.21 |
| 10 |
information technology
|
658 |
1.12 |
Most Downloaded File Types.
Sorted by Access Count
Most Downloaded File Types
|
|
|
File type |
Files |
K Bytes Transferred |
|
1 |
gif |
1,522,220 |
1,671,971 |
|
2 |
jpg |
449,425 |
5,660,605 |
|
3 |
html |
230,096 |
8,521,768 |
|
4 |
js |
57,849 |
1,289,407 |
|
5 |
pdf |
32,906 |
1,049,564 |
|
6 |
htm |
6,543 |
90,479 |
|
7 |
txt |
1,182 |
482 |
|
8 |
class |
266 |
2,494 |
|
9 |
css |
185 |
236 |
|
10 |
com/ |
5 |
165 |
|
|
Total Files &
K Bytes Transferred |
2,300,677 |
18,287,166 |
Screen Resolution.
Computers offering at least 640x480 pixels, but less
than 800x600 pixels, are reported as 640x480, and so
on.
| # |
Resolution |
Computers |
% |
| 1 |
640x480 |
867,629 |
3.36 |
| 2 |
800x600 |
9,208,487 |
35.62 |
| 3 |
1024x768 |
12,830,967 |
49.64 |
| 4 |
1280x1024 |
2,421,307 |
9.37 |
| 5 |
1600x1200 and above |
520,560 |
2.01 |
CONCLUSIONS
Unlike other marketing venues, visitors to a web site are
typically anonymous. Web traffic analysis software,
however, can help marketers better understand who is
visiting a web site and why. This software can help
identify the types of companies that are visiting a
website, where they came from, and what types of
information they requested. Advanced Web traffic
analysis software can even provide behavioral data
about the visitors. For example, when visitors are
leaving a site and their buying patterns. Altogether
these software packages, allow conclusions to be drawn
from the volume of individuals requests
made to a server.
REFERENCES
- HREF 1: Web Traffic Analysis Software
- Available:
http://www.businesswire.com/emk/mwave3.htm
- Cited: January 13, 2003
- HREF 2: WUSAGE 8
- Available:
http://boutell.com/wusage
- Cited: January 13, 2003
- HREF 3: Why Web Usage Statistics are Meaningless
- Available:
http://www.goldmark.org/netrants/webstats
- Cited: January 13, 2003
- HREF 4: Measuring Web Site Usage: Log File Analysis
- Available:
http://www.nlc-bnc.ca/publications/1/p1-256-e.html
- Cited: January 13, 2003
- HREF 5: Log Analyzers
- Available:
http://www.networkingfiles.com/LogAnal/loganal.htm
- Cited: January 13, 2003