Defining a Structure
A Hex Workshop structure closely resembles a structure definition the C programming language, which is familiar to many developers and easily learned by others. Future versions of Hex Workshop will incorporate a structure building tool along with the ability to extract structures from existing C/C++ source files.
The following is an example of a Hex Workshop Structure definition. Both C++ comments ("// COMMENT") and C comments( "/* COMMENT */") are supported.
/*
* LocalFileHeader for a .ZIP compressed file.
*/
struct LocalFileHeader
{
char Signature[4]; // PK<0x03><0x04>
#pragma verify match_var_int("Signature[0]", "0x50")
#pragma verify match_var_int("Signature[1]", "0x4B")
#pragma verify match_var_int("Signature[2]", "0x03")
#pragma verify match_var_int("Signature[3]", "0x04")
WORD VersionNeededToExtract;
WORD GeneralPurposeBitFlag;
WORD CompressionMethod;
DOSTIME LastModFileTime;
DOSDATE LastModFileDate;
DWORD Crc32;
DWORD CompressedSize;
DWORD UncompressedSize;
WORD FileNameLength;
WORD ExtraFieldLength;
};
The general format of a structure is as follows:
struct <<STRUCTURE_NAME>>
{
<<DATA_TYPES_AND_NAMES>>
};
A structure definition begins with the key word "struct" followed by the structure name. The structure name cannot contain any tabs or spaces. An opening brace "{" marks the beginning of the data declaration and a closing brace "}" marks its end. Lastly, a semicolon ";" marks the end of the structure definition.
Data types are declared in the following form:
<<DATA_TYPE>> <<VARIABLE_NAME>;
A trailing semicolon is required and variable names cannot contain tabs or spaces.
For a list of basic built-in data types, see Basic Structure Data Types. Additional data types are provided in the standard-type library included with Hex Workshop.
To specify an array of a data type, indicate the desired repetition count of the array, in decimal, surrounded by brackets after the variable name. In the example below, 32 values of data type "int" are defined under the variable myArray:
int myArray[32];
Users may also define variable length arrays within structures. A variable length character string is demonstrated below:
struct pstring16
{
unsigned short len; // 16 bits worth of length
char content[len]; // Actual string content
};
Users can also perform simple calculations within array declarations. For example, consider the need to deserialize a 2-dimensional array of bytes:
struct myByteArray
{
unsigned short rows;
unsigned short columns;
__int8 data[rows*columns];
};
You can nest one structure within another by using a command of the following form:
struct <<STRUCTURE_NAME>> <<VARIABLE_NAME>>;
In the following example, an ARGB structure is defined and followed by the definition of a palette structure, which as defined below contains an array of 256 ARGB structures.
struct ARGB
{
BYTE alpha;
BYTE red;
BYTE green;
BYTE blue;
};
struct palette
{
struct ARGB entries[256];
};
Enumerated types allow users to couple human readable names with numeric values. When an enumeration is defined, Hex Workshop allows you to view and work with the human readable names within the editing environment.
Enumerated types are defined similarly to structures. The general form begins with the "enum" keyword, followed by the enumeration name, an opening brace "{", enumeration definitions, a closing brace "}", and a trailing semicolon ";". Enumeration values must be specified in decimal.
enum <<ENUM_NAME>>
{
<<ENUM_DEFINITIONS>>
};
enum _FINDEX_SEARCH_OPS
{
FindExSearchNameMatch, // Value: 0
FindExSearchLimitToDirectories, // Value: 1
FindExSearchLimitToDevices, // Value: 2
FindExSearchMaxSearchOp // Value: 3
};
enum myOtherExample
{
MY_STARTING_VALUE = 100, // Value: 100
MY_NEXT_VALUE_1, // Value: 101
MY_NEXT_VALUE_2, // Value: 102
MY_RESET_VALUE = 200, // Value: 200
MY_NEXT_VALUE_3, // Value: 203
MY_OTHER_VALUE = 0x10 // Value: 16
};
The typedef keyword allows users to create and name new data types. Each new data type must map to a basic built-in data type or a pre-defined type. The following example creates two 8 bit signed integer data types named "BYTE" and "byte". The struct myTypedefExample then uses a basic type and the newly created data types to create a structure with three 8 bit signed values (b1, b2, and b3).
typedef signed __int8 BYTE;
typedef BYTE byte;
struct myTypedefExample
{
signed __int8 b1;
BYTE b2;
byte b3;
} ;
Users can also typedef structures and enumerations. Typedef must be specified as part of the declaration. The example below defines a new data type POINT and LINE structure. The LINE structure defines the start and end points of the line by naming the full structure (struct tagPOINT) and the typedef name (POINT):
typedef struct tagPOINT
{
LONG x;
LONG y;
} POINT;
struct LINE
{
struct tagPOINT start;
POINT end;
};
By adding a #include directive to a structure library definition file, Hex Workshop will insert the literal contents of the designated structure library (#include parameter) into your current structure library file where the #include is defined. For example, all of the sample structure libraries included with Hex Workshop reference a common library (standard-types.hsl) that consists of common and standard data types. The #include directive used in the sample libraries is provided below.
#include "standard-types.hsl"
The displayname pragma defines the friendly name of the structure. The friendly name is displayed on the structure viewer selection tool.
#pragma displayname("zip structures")
The fileextensions pragma defines which document extensions are appropriate for the structure definition. Multiple file extensions can be specified by using a semicolon a delimited. If the structure definition is loaded/open, Hex Workshop will automatically select the library whenever a compatible document is in focus.
#pragma fileextensions(".zip;.jar")
By default an enumerate type is assumed to be a 4 byte (32 bit) data member. To define an enumeration for an 8 bit, 16 bit, or 64 bit enumerated type, use the #pragma directive to indicate the size. The #pragma directive sets the enumeration data size for all enumerations defined after the directive until a new #pragma is encountered.
#pragma enumsize(1) // Enums
defined after here are 1 byte (8 bits)
<<enum definitions>>
#pragma enumsize(2) // Enums defined after here are 2 bytes (16 bits)
<<enum definitions>>
#pragma enumsize(4) // Enums defined after here are 4 bytes (32 bits)
<<enum definitions>>
#pragma enumsize(8) // Enums defined after here are 8 bytes (64 bits)
<<enum definitions>>
Structure definitions can switch the interpretation type of the enumeration elements using the enum_sign #pragma. Options include signed and unsigned. Enumerated types are signed by default.
#pragma enum_sign("signed")
<<enum definitions>>
#pragma enum_sign("unsigned")
<<enum definitions>>
The sign can be changed between enumeration definitions, but not in the middle of a definition.
By default, Hex Workshop limits the length of arrays to 1024 members. This setting imposes an upper limit to bound how much processing Hex Workshop performs when evaluation structures. If a structure definition contains recursive/exponential data structures (e.g. arrays of arrays) and is applied to a corrupt file, Hex Workshop may appear to hang while processing.
#pragma maxarray(2048); // Increase the max array length to 2048
The Hex Workshop structure view can accommodate three popular types of strings:
- A variable length null (or zero) terminated
string (zstring).
- A fixed length string.
- A variable length string where the length preceeds the string content.
Example of the three string types are shown below:
struct stringexample
{
zstring null_terminated_str;
char fixed_length_str[128];
struct length_first_str
{
WORD length;
char string[length];
};
};
Like array lengths, Hex Workshop limits the maximum string length to avoid run-away processing of corrupt files. By default, strings are limited to 512 characters, however, users may increase the number of charcters to a max of 65,536 by using the maxstring #pragma. An example is shown below:
#pragma maxstring(128); // Decrease the max string length to 128
Hex Workshop v4.2 added basic support for C style bitfields. Bitfields allow users to view and edit a selection of consecutive bits as an independent integer. This reduces the need to count bits and convert binary to decimal. Hex Workshop supports 8, 16, 32, and 64 bit bitfields.
struct screenchar
{
unsigned short character : 8;
unsigned short color : 4;
unsigned short underline : 1;
unsigned short blink : 1;
};
In the example above, 16 bit value is broken into a 8-bit character, 4 bit color, and 1 bit flags for the underline and blink attributes.
The switch keyword allow users to define conditional structure elements based on another structure variable. The general form is:
switch (<<VARIABLE>>)
{
case <<INTEGER>>:
<<DATA TYPES>>
break;
case <<INTEGER>>:
<<DATA TYPES>>
break;
default:
<<DATA TYPES>>
break;
};
#include "standard-types.hsl
struct conditionalExample
{
WORD type ;
WORD len ;
switch (type)
{
case 1:
WORD value ;
#pragma verify match_var_int("len", "2")
break;
case 2:
DWORD value ;
#pragma verify match_var_int("len", "4")
break;
default:
blob value[len];
break;
};
};
- Nested switch statements are
not supported
- The variable used in the switch statement must be an integer (8, 16,
32, or 64 bit).
- Each case statement must be followed by break statement. The
following is not supported:
...
case 1:
case 2:
<<DATA>>
break
;
...
- Users must defined variables in all case statements and include a "default"
case to use variables outside of the case statement.
Hex Workshop includes functions to build complex structure definitions.
Function |
Description |
addrof |
Returns the address of a variable. struct example { unsigned __int32 header; blob filler[28]; blob data[ushortAt(addrof(header))]; }; In this example, the length of the "data" blob is set to the first 16 bytes of the "header" member. |
sizeof |
Returns the size of a variable in bytes. struct example1 };
In this example, a "len" is defined as either a 16 bit or 32 bit value depending on a "version" field. The size of the "len" can then be used in a conditional switch statement. |
ubyteAt |
Returns an unsigned 8 bit integer at the specified address. struct example
In this example, fixed-length strings are defined based on the byte value at locations 0xA/10 and 0x10/16. |
ushortAt |
Returns an unsigned 16 bit integer at the specified address. struct example
In this example, the array length of "myString" is obtained by looking at the file position stored in "lengthFilePosition". |
ulongAt |
Returns an unsigned 32 bit integer at the specified address. struct example
In this example, the length of values array is set to the 32 bit value at offset 10 plus 1. |
byteflip16 byteflip32 byteflip64 |
Flips the byte order from Little Endian to Big Endian or Big Endian to Little Endian. The "16", "32", and "64" suffixes select 16 bit, 32 bit, or 64 bit byte swapping. struct example In this example, the length of the values array is set to the byte-swapped value at offset 10 plus 16. |
The __this__ and __parent__ variables are used to reference data from the base of the structure or parent structure.
__parent__
Considering having a nested structure that references a length variable within two different parent structures. One might want to define a structure that is based off a relative address from the start of a parent instead of the actual variable name. In the example below, the length variable is defined with different names in PARENT_1 and PARENT_2
typedef struct NESTED
{
WORD someData[ushortAt(addrof(__parent__))] ;
} NESTED ;
typedef struct PARENT_1
{
WORD parent1Length;
NESTED nested ;
} PARENT_1;
typedef struct PARENT_2
{
WORD parent2Length;
NESTED nested ;
} PARENT_2;
In the example above, the "NESTED" structure defines an array of WORDs where the array length hard-coded to the first WORD in any parent structure. This allows users to use this structure in definitions where the parent variable name may differ. In this case, PARENT_1's length is "parent1Length" and PARENT_2's length is "parent2Length".
__this__
The __this__ keyword references the begin of a structure. Consider a case where one has a WORD length followed by N number of words. You can define a structure to display this data this in multiple ways:
typedef struct TYPICAL
{
WORD length;
WORD data[length];
} TYPICAL;
typedef struct ALTERNATIVE
{
WORD data[ushortAt(addrof(__this__))+1];
} ALTERNATIVE;
In this example, the "ALTERNATIVE" structure's length is defined as effectively at the data[0] position. The "TYPICAL" structure is clearly preferred; however, cases exist where using __this__ is useful.
The verify #pragma allows users to verify basic pre-conditions on structures (sanity checking). If a verify pre-condition fails while adding a locked structure, the user is alerted and asked if they would like to continue. If the user continues or a floating structure fails validation, the structure viewer will display "verify failed" until the document data is modified or structure is repositioned so that validation passes.
Using a .zip file LocalFileHeader example above, the user could be presented with the following:
The Syntax is as follows:
#pragma verify <<MATCHTYPE>>(<<VARIABLE>>, <<VALUE>>)
NOTE: The variable and value must be quoted.
MATCHTYPE |
Description |
match_var_int |
Compares a variable against the specified integer value. 8, 16, 32, and 64 bit values are supported. The value can be declared in either decimal or hex. Examples: #pragma verify match_var_int("Signature[0]", "0x50") #pragma verify match_var_int("Signature[1]", "0x4B") #pragma verify match_var_int("Signature[2]", "0x03") #pragma verify match_var_int("Signature[3]", "0x04") #pragma verify match_var_int("magic", "0xCAFEBABE")
|
match_var_str |
Compares a variable against the specified string value. Examples: #pragma verify match_var_str("method", "add")
|
The lockAt and floatAt pragma commands are used to suggest an initial locked or floating offset. If specified, the add structure dialog will pre-populate the offset and float/locked options when adding a structure.
#pragma lockAt(offset)
#pragma floatAt(offset)
Where offset can be a decimal (e.g. 32) or hexadecimal value (e.g. 0x20).
struct LocalFileHeader
{
#pragma lockAt(0x00000010)
char Signature[4]; // PK<0x03><0x04>
#pragma verify match_var_int("Signature[0]", "0x50")
#pragma verify match_var_int("Signature[1]", "0x4B")
#pragma verify match_var_int("Signature[2]", "0x03")
#pragma verify match_var_int("Signature[3]", "0x04")
WORD VersionNeededToExtract;
WORD GeneralPurposeBitFlag;
WORD CompressionMethod;
DOSTIME LastModFileTime;
DOSDATE LastModFileDate;
DWORD Crc32;
DWORD CompressedSize;
DWORD UncompressedSize;
WORD FileNameLength;
WORD ExtraFieldLength;
};
See also Structure Viewer Overview, Adding a Structure, Removing a Structure and Basic Structure Data Types.