UserAgent Test Data
This document describes the test data files used in DeviceMap tests.
UserAgentString.txt
Columns :
- UserAgentString : nvarchar(1500)
Currently contains 918,709 unique user agent strings.
The majority was collected from web access logs from live web servers.
102,121 of these were identified as belonging to mobile or other devices.
UserAgentDetail.txt
Pipe-separated text file.
Columns :
- StringHash : varbinary(32) : hashbytes('SHA2_256', UserAgentString)
- TypeId : int
- Flag : int
Because there is no separator character imaginable that can be useful to separate columns, the actual user agent string is split from it's properties in UserAgentDetail.txt.
The user agent string is linked to it's detail record via its SHA-2 256 hash. (In an RDBMS, like MS SQL, adding this field as persistent computed columns speeds things up considerably.)
The TypeId field is the PK or Id of the Types listed in UserAgentType.txt.
The Flag field is used to mark user agent strings so that the same set can be used in different tests (see below).
UserAgentType.txt
Pipe-separated text file.
Columns :
- Id int
- Type nvarchar(50)
UserAgentType list 76 types of user agent strings (some of which are debatable).
UserAgentDevice.txt
Pipe-separated text file.
Columns :
- StringHash : UserAgent SHA-256 hash
- OpenDdr : OpenDdr device Id found via OpenDdr code
- DeviceMap : OpenDdr device Id found via DeviceMapClient code
- Flag used to separate data sets for testing
Testing
For tests the data is best loaded in an RDBMS.
This is the general procedure I use :
- Create instance of client/parser class 2. GetDataSet : SELECT PK and UserAgentString : random, based on type or flagged dataset 3. 'cold' run using 3 pre-selected user agent strings 4. For each UserAgentString in DataSet