We (and the users) want field names in machine readable outputs to be (lame pun ahead) safe, sane, and consistent.
- Safe — can be used as variable or object member names in any programming language if needed.
- Sane — don't raise questions why anyone would use such a format for a field.
- Consistent — use the same character set and separator convention throughout.
Unfortunately, many data sources aren't even internally consistent. For example, in /proc/cpuinfo you can find vendor_id (underscore-separated words) and core id (space-separated words) right next to each other.
We have to do our own normalization to prevent that.
- Only underscore separators are permitted. All whitespace, hyphens, slashes, dots/commas, and parens/brackets/braces are replaced with underscores.
- Only lowercase ASCII letters and digits are permitted in identifiers.
- Uppercase letters are converted to lowercase.
- Other characters may be converted to textual description (e.g., % — percentage).