CSV File Validation In PHP (Part I)

Last week, I had to implement a CSV file validation in PHP. It’s a potentially very big file with several columns, where each one has its custom restrictions. For example, the “SKU” field must be an alphanumeric string no longer than 64 characters, the “IMAGES” field must contain image filename(s) and/or image URL(s) (comma separated values), and so on.

Eventually, I came up with a solution, which consists on using a strategy pattern, implementing a custom validation strategy for each field.

In order to fully ensure that the input data is correct, two kinds of validations must be performed:

  • Restrictions regarding the format, like allowed characters or maximun length.
  • Not only the data must be in the proper format, but also it must be valid from a semantical point of view. For instance, the SKUs must exist on the database, or each image URL specified must actually be an online image (an html, or an offline server shouldn’t pass the validation).

So, I split the requirements in two kind of validators: lexical and semantic.

This is how it looks like, hopefully it can be useful if you are in a similar situation:

Base validator:

abstract class Abstract_Validator {

/*
 * Tell whether the data is correct or not
 * Might change (correct) the $input value
 *
 * @return bool
 */
 abstract function validate(&$input);
 /*
 * Return a string with the details of the error and/or the allowed values for the current field.
 *
 * @param $input string|array
 * @return string
 */
 abstract function getErrorMsg($input);

 }

Base lexical validator with an implementation example:

abstract class Abstract_Lexical extends Abstract_Validator {

 protected $tokens;

 function __construct() {
 $this->tokens = array();
 }

 public function getTokens() {
 return $this->tokens;
 }
 public function validate(&$input) {
 $this->tokens[] = $input;
 }

/*
* Return the last part of the error message
* Can be called from a child class to complete the returned message.
*
* @param $input string
* @return string
*/
 public function getErrorMsg($input) {
   return sprintf('but "%s" provided.', $input);
 }
}

/*
 * EAN is an optional field, but if it's not empty, it must be a valid code.
 */
class Lexical_Ean extends Abstract_Lexical {
 protected function checkEAN($fullcode) {
 $code = substr($fullcode, 0, -1);
 $checksum = 0;
 foreach (str_split(strrev($code)) as $pos => $val) {
 $checksum += $val * (3 - 2 * ($pos % 2));
 }
 return (10 - ($checksum % 10)) % 10 == substr($fullcode,-1);
 }
 public function validate(&$input) {
 parent::validate(&$input);
 return strlen($input) == 0 || $this->checkEAN($input);
 }

 public function getErrorMsg($input) {
 return 'allowed values: valid EAN code (13 digits number), ' . parent::getErrorMsg($input);
 }
 }

Base Semantic validator, with an implementation example:

abstract class Abstract_Semantic extends Abstract_Validator {
 protected $dbLink;
 protected $errors;
 protected static $FIELD;
 protected static $QUERY;
 protected static $TYPE = 'validateExisting';

 function __construct($dbLink = null) {
 $errors = array();
 if($dbLink) {
 $this->dbLink = $dbLink;
 }
 }

 /*
 * Default validator
 * Ensure the extended class has defined the required values
 * Perform the query and pass the result to the proper method
 *
 * @param $input simple array with data
 * @return bool
 */
 public function validate(&$input) {

 if (!$this->dbLink || !static::$QUERY || !static::$FIELD || !static::$TYPE) {
 $this->errors[] = 'Error validating ' . get_class($this) . ', incomplete data was provided.';
 return false;
 }
 $field = static::$FIELD;
 $type = static::$TYPE;
 $list = $this->dbLink->query(sprintf(static::$QUERY, implode($input, '","')));
 return $this->$type($list, $field, $input);
 }

 /*
 * Ensure all requested items where present in the database
 * Otherwise, log all not found items
 *
 * @param list mysqli_result
 * @param $field field to check
 * @param $input array with data
 * @return bool
 */
 protected function validateExisting($list, $field, $input) {
 $ret = $list->num_rows == count($input);
 if(!$ret) {
 $found = array();
 while($obj = $list->fetch_object()){
 $found[] = $obj->$field;
 }
 $this->errors = array_diff($input, $found);
 }
 return $ret;
 }

 /*
 * Return the errors found during the validation
 * @return array
 */
 public function getErrors() {
 return $this->errors;
 }

 /**
 * Return the description message for the error.
 *
 * @param $input array
 * @return string
 */
 public function getErrorMsg($input) {
 if (static::$TYPE == 'validateExisting') {
 $exist = 'must exist';
 } else $exist = 'cannot exist';

 return sprintf('%s(s) "%s" %s on the database.', str_replace('Semantic_', '', get_class($this)), implode($input, ', '), $exist);
 }
 }

/**
 * Store field must correspond to an existing store in the database.
 */
class Semantic_Store extends Abstract_Semantic {
 protected static $QUERY = 'SELECT code from stores where code in("%s")';
 protected static $FIELD = 'code';
 }

That’s all for the moment, on next post I’ll explain how I did join all this, iterating over the CSV file, stay tuned!