Coding Challenge — Building a JSON Parser in PHP
Let’s build a JSON parser in PHP, following the steps outlined in the challenge. We’ll implement a basic lexer and parser to handle the progressively more complex JSON objects described in each step. Here’s the approach for each step:
Step 1: Parse a Simple JSON Object ‘{}’
Objective:
- Parse the string
'{}'
to recognize it as a valid JSON object. - Identify invalid JSON and report accordingly.
Solution:
- Create a simple lexer to tokenize the input string.
- Build a basic parser to recognize the tokens and determine if they form a valid JSON object.
Implementation:
<?php
function lex($input) {
$tokens = [];
$length = strlen($input);
for ($i = 0; $i < $length; $i++) {
$char = $input[$i];
if ($char === '{' || $char === '}') {
$tokens[] = $char;
} elseif (!ctype_space($char)) {
// Invalid character for step 1
return null;
}
}
return $tokens;
}
function parse($tokens) {
if ($tokens === null) {
return false;
}
if (count($tokens) === 2 && $tokens[0] === '{' && $tokens[1] === '}') {
return true;
}
return false;
}
function validate_json($input) {
$tokens = lex($input);
if (parse($tokens)) {
echo "Valid JSON\n";
exit(0);
} else {
echo "Invalid JSON\n";
exit(1);
}
}
// Test the function with input
$input = '{}'; // Replace this with file reading in practice
validate_json($input);
?>
Step 2: Parse JSON Object with String Keys and Values
Objective:
- Extend the parser to handle JSON objects with string keys and string values, like
{"key": "value"}
.
Implementation:
<?php
function lex($input) {
$tokens = [];
$length = strlen($input);
$i = 0;
while ($i < $length) {
$char = $input[$i];
if ($char === '{' || $char === '}' || $char === ':' || $char === ',') {
$tokens[] = $char;
$i++;
} elseif ($char === '"') {
$end = strpos($input, '"', $i + 1);
if ($end === false) {
return null;
}
$tokens[] = substr($input, $i, $end - $i + 1);
$i = $end + 1;
} elseif (ctype_space($char)) {
$i++;
} else {
return null;
}
}
return $tokens;
}
function parse($tokens) {
if ($tokens === null) {
return false;
}
$state = 'START';
foreach ($tokens as $token) {
switch ($state) {
case 'START':
if ($token === '{') {
$state = 'KEY';
} else {
return false;
}
break;
case 'KEY':
if (preg_match('/^".*"$/', $token)) {
$state = 'COLON';
} else {
return false;
}
break;
case 'COLON':
if ($token === ':') {
$state = 'VALUE';
} else {
return false;
}
break;
case 'VALUE':
if (preg_match('/^".*"$/', $token)) {
$state = 'COMMA_OR_END';
} else {
return false;
}
break;
case 'COMMA_OR_END':
if ($token === ',') {
$state = 'KEY';
} elseif ($token === '}') {
$state = 'END';
} else {
return false;
}
break;
case 'END':
return false;
}
}
return $state === 'END';
}
function validate_json($input) {
$tokens = lex($input);
if (parse($tokens)) {
echo "Valid JSON\n";
exit(0);
} else {
echo "Invalid JSON\n";
exit(1);
}
}
// Test the function with input
$input = '{"key": "value"}'; // Replace this with file reading in practice
validate_json($input);
?>
Step 3: Extend to Handle Various Data Types
Objective:
- Extend the parser to handle string, numeric, boolean, and null values.
Implementation:
<?php
function lex($input) {
$tokens = [];
$length = strlen($input);
$i = 0;
while ($i < $length) {
$char = $input[$i];
if ($char === '{' || $char === '}' || $char === ':' || $char === ',' || $char === '-' || ($char >= '0' && $char <= '9')) {
$tokens[] = $char;
$i++;
} elseif ($char === '"') {
$end = strpos($input, '"', $i + 1);
if ($end === false) {
return null;
}
$tokens[] = substr($input, $i, $end - $i + 1);
$i = $end + 1;
} elseif (ctype_space($char)) {
$i++;
} elseif (preg_match('/^(true|false|null)/', substr($input, $i), $match)) {
$tokens[] = $match[0];
$i += strlen($match[0]);
} else {
return null;
}
}
return $tokens;
}
function parse($tokens) {
if ($tokens === null) {
return false;
}
$state = 'START';
foreach ($tokens as $token) {
switch ($state) {
case 'START':
if ($token === '{') {
$state = 'KEY';
} else {
return false;
}
break;
case 'KEY':
if (preg_match('/^".*"$/', $token)) {
$state = 'COLON';
} else {
return false;
}
break;
case 'COLON':
if ($token === ':') {
$state = 'VALUE';
} else {
return false;
}
break;
case 'VALUE':
if (preg_match('/^".*"$/', $token) || is_numeric($token) || $token === 'true' || $token === 'false' || $token === 'null') {
$state = 'COMMA_OR_END';
} else {
return false;
}
break;
case 'COMMA_OR_END':
if ($token === ',') {
$state = 'KEY';
} elseif ($token === '}') {
$state = 'END';
} else {
return false;
}
break;
case 'END':
return false;
}
}
return $state === 'END';
}
function validate_json($input) {
$tokens = lex($input);
if (parse($tokens)) {
echo "Valid JSON\n";
exit(0);
} else {
echo "Invalid JSON\n";
exit(1);
}
}
// Test the function with input
$input = '{
"key1": true,
"key2": false,
"key3": null,
"key4": "value",
"key5": 101
}'; // Replace this with file reading in practice
validate_json($input);
?>
Step 4: Parse JSON Object with Object and Array Values
Objective:
- Extend the parser to handle nested objects and arrays.
Implementation:
<?php
function lex($input) {
$tokens = [];
$length = strlen($input);
$i = 0;
while ($i < $length) {
$char = $input[$i];
if ($char === '{' || $char === '}' || $char === '[' || $char === ']' || $char === ':' || $char === ',' || $char === '-') {
$tokens[] = $char;
$i++;
} elseif ($char === '"') {
$end = strpos($input, '"', $i + 1);
if ($end === false) {
return null;
}
$tokens[] = substr($input, $i, $end - $i + 1);
$i = $end + 1;
} elseif (ctype_space($char)) {
$i++;
} elseif (preg_match('/^(true|false|null)/', substr($input, $i), $match)) {
$tokens[] = $match[0];
$i += strlen($match[0]);
} elseif (preg_match('/^-?\d+(\.\d+)?([eE][-+]?\d+)?/', substr($input, $i), $match)) {
$tokens[] = $match[0];
$i += strlen($match[0]);
} else {
return null;
}
}
return $tokens;
}
function parse_value(&$tokens, &$i) {
if ($i >= count($tokens)) return false;
if (preg_match('/^".*"$/', $tokens[$i]) || is_numeric($tokens[$i]) || $tokens[$i] === 'true' || $tokens[$i] === 'false' || $tokens[$i] === 'null') {
$i++;
return true;
} elseif ($tokens[$i] === '{') {
return parse_object($tokens, $i);
} elseif ($tokens[$i] === '[') {
return parse_array($tokens, $i);
}
return false;
}
function parse_object(&$tokens, &$i) {
if ($i >= count($tokens) || $tokens[$i] !== '{') {
return false;
}
$i++;
if ($i < count($tokens) && $tokens[$i] === '}') {
$i++;
return true;
}
while (true) {
if ($i >= count($tokens) || !preg_match('/^".*"$/', $tokens[$i])) {
return false;
}
$i++;
if ($i >= count($tokens) || $tokens[$i] !== ':') {
return false;
}
$i++;
if (!parse_value($tokens, $i)) {
return false;
}
if ($i < count($tokens) && $tokens[$i] === '}') {
$i++;
return true;
}
if ($i >= count($tokens) || $tokens[$i] !== ',') {
return false;
}
$i++;
}
}
function parse_array(&$tokens, &$i) {
if ($i >= count($tokens) || $tokens[$i] !== '[') {
return false;
}
$i++;
if ($i < count($tokens) && $tokens[$i] === ']') {
$i++;
return true;
}
while (true) {
if (!parse_value($tokens, $i)) {
return false;
}
if ($i < count($tokens) && $tokens[$i] === ']') {
$i++;
return true;
}
if ($i >= count($tokens) || $tokens[$i] !== ',') {
return false;
}
$i++;
}
}
function parse($tokens) {
if ($tokens === null) {
return false;
}
$i = 0;
if (!parse_object($tokens, $i)) {
return false;
}
return $i === count($tokens);
}
function validate_json($input) {
$tokens = lex($input);
if (parse($tokens)) {
echo "Valid JSON\n";
exit(0);
} else {
echo "Invalid JSON\n";
exit(1);
}
}
// Test the function with input
$input = '{
"key": "value",
"key-n": 101,
"key-o": {},
"key-l": []
}'; // Replace this with file reading in practice
validate_json($input);
?>
Step 5: Add Custom Tests
Objective:
- Write additional tests to ensure robustness of the parser.
Example Tests:
<?php
function lex($input) {
$tokens = [];
$length = strlen($input);
$i = 0;
while ($i < $length) {
$char = $input[$i];
if ($char === '{' || $char === '}' || $char === '[' || $char === ']' || $char === ':' || $char === ',' || $char === '-') {
$tokens[] = $char;
$i++;
} elseif ($char === '"') {
$end = strpos($input, '"', $i + 1);
if ($end === false) {
return null;
}
$tokens[] = substr($input, $i, $end - $i + 1);
$i = $end + 1;
} elseif (ctype_space($char)) {
$i++;
} elseif (preg_match('/^(true|false|null)/', substr($input, $i), $match)) {
$tokens[] = $match[0];
$i += strlen($match[0]);
} elseif (preg_match('/^-?\d+(\.\d+)?([eE][-+]?\d+)?/', substr($input, $i), $match)) {
$tokens[] = $match[0];
$i += strlen($match[0]);
} else {
return null;
}
}
return $tokens;
}
function parse_value(&$tokens, &$i) {
if ($i >= count($tokens)) return false;
if (preg_match('/^".*"$/', $tokens[$i]) || is_numeric($tokens[$i]) || $tokens[$i] === 'true' || $tokens[$i] === 'false' || $tokens[$i] === 'null') {
$i++;
return true;
} elseif ($tokens[$i] === '{') {
return parse_object($tokens, $i);
} elseif ($tokens[$i] === '[') {
return parse_array($tokens, $i);
}
return false;
}
function parse_object(&$tokens, &$i) {
if ($i >= count($tokens) || $tokens[$i] !== '{') {
return false;
}
$i++;
if ($i < count($tokens) && $tokens[$i] === '}') {
$i++;
return true;
}
while (true) {
if ($i >= count($tokens) || !preg_match('/^".*"$/', $tokens[$i])) {
return false;
}
$i++;
if ($i >= count($tokens) || $tokens[$i] !== ':') {
return false;
}
$i++;
if (!parse_value($tokens, $i)) {
return false;
}
if ($i < count($tokens) && $tokens[$i] === '}') {
$i++;
return true;
}
if ($i >= count($tokens) || $tokens[$i] !== ',') {
return false;
}
$i++;
}
}
function parse_array(&$tokens, &$i) {
if ($i >= count($tokens) || $tokens[$i] !== '[') {
return false;
}
$i++;
if ($i < count($tokens) && $tokens[$i] === ']') {
$i++;
return true;
}
while (true) {
if (!parse_value($tokens, $i)) {
return false;
}
if ($i < count($tokens) && $tokens[$i] === ']') {
$i++;
return true;
}
if ($i >= count($tokens) || $tokens[$i] !== ',') {
return false;
}
$i++;
}
}
function parse($tokens) {
if ($tokens === null) {
return false;
}
$i = 0;
if (!parse_object($tokens, $i)) {
return false;
}
return $i === count($tokens);
}
function validate_json($input) {
$tokens = lex($input);
if (parse($tokens)) {
echo "Valid JSON\n";
exit(0);
} else {
echo "Invalid JSON\n";
exit(1);
}
}
function run_tests() {
$tests = [
'{}' => true,
'{"key": "value"}' => true,
'{"key1": true, "key2": false, "key3": null, "key4": "value", "key5": 101}' => true,
'{"key": "value", "key-n": 101, "key-o": {}, "key-l": []}' => true,
'{"key": [1, 2, "string", null, true, false]}' => true,
'{"key": {"nestedKey": "nestedValue"}}' => true,
'{"key": "value", "missingComma" "value2"}' => false,
'{"key": "value", "extraComma": "value2",}' => false,
'{key: "value"}' => false,
'{"key": "value", "invalidKey": [}' => false,
'not a json string' => false,
'{
"key1": "value1",
"key2": 123,
"key3": true,
"key4": null,
"key5": {
"nestedKey1": "nestedValue1",
"nestedKey2": [1, 2, 3, "four", null, false, {"deepKey": "deepValue"}]
},
"key6": [true, false, [1, 2, {"innerKey": "innerValue"}], "string in array"],
"key7": {}
}' => true
];
foreach ($tests as $input => $expected) {
echo "Testing: $input\n";
$tokens = lex($input);
$result = parse($tokens);
$is_valid = $result === true;
echo $is_valid === $expected ? "PASS\n" : "FAIL\n";
echo "\n";
}
}
run_tests();
?>
Conclusion:
Building a JSON parser in PHP from scratch is a challenging but rewarding task. It involves understanding the intricacies of the JSON format and carefully handling different types of data, including strings, numbers, booleans, null values, nested objects, and arrays.
Here are the key takeaways from this project:
- Lexer Implementation: We started by creating a lexer to tokenize the JSON input. This step was crucial to break down the JSON string into manageable parts that our parser could process.
- Parser Construction: We developed a parser to validate and interpret these tokens. The parser functions (
parse_value
,parse_object
, andparse_array
) worked together to handle different JSON structures. - Handling Different Data Types: The parser was extended incrementally to handle various data types, including strings, numbers, booleans, nulls, and complex nested objects and arrays. This required careful consideration of JSON syntax rules.
- Robust Testing: Comprehensive testing was conducted to ensure the parser could handle both valid and invalid JSON inputs. This included simple cases as well as more complex, nested structures.
- Error Handling: The parser was designed to fail gracefully, providing clear messages for invalid JSON inputs and ensuring it adhered to expected behavior.
Through this process, we gained insights into the complexity of JSON parsing and the importance of robust error handling and testing. This project not only demonstrated the technical challenges involved but also highlighted the importance of incremental development and thorough validation.
Overall, building a JSON parser in PHP was an excellent exercise in both programming and understanding the JSON format in depth. It provided a strong foundation for more advanced topics in parsing and compiler design.