UserAgent Test Data

This document describes the test data files used in DeviceMap tests.

SVN TestData

UserAgentString.txt

Columns :

Currently contains 918,709 unique user agent strings.

The majority was collected from web access logs from live web servers.

102,121 of these were identified as belonging to mobile or other devices.

UserAgentDetail.txt

Pipe-separated text file.

Columns :

Because there is no separator character imaginable that can be useful to separate columns, the actual user agent string is split from it's properties in UserAgentDetail.txt.

The user agent string is linked to it's detail record via its SHA-2 256 hash. (In an RDBMS, like MS SQL, adding this field as persistent computed columns speeds things up considerably.)

The TypeId field is the PK or Id of the Types listed in UserAgentType.txt.

The Flag field is used to mark user agent strings so that the same set can be used in different tests (see below).

UserAgentType.txt

Pipe-separated text file.

Columns :

  • Id int
  • Type nvarchar(50)

UserAgentType list 76 types of user agent strings (some of which are debatable).

UserAgentDevice.txt

Pipe-separated text file.

Columns :

Testing

For tests the data is best loaded in an RDBMS.

This is the general procedure I use :

  1. Create instance of client/parser class 2. GetDataSet : SELECT PK and UserAgentString : random, based on type or flagged dataset 3. 'cold' run using 3 pre-selected user agent strings 4. For each UserAgentString in DataSet
  • No labels