Jenkins for PHP from scratch – I

jenkins logoThis is meant to be a full guide of how to set up jenkins for php, starting from the very beginning until the end. As it might become a very long guide, I’ll try to simplify it as much as possible, and I’ll split it in two or three posts.

All the guide was made under Ubuntu 12.10 though it should work fairly well in most of the latest Ubuntu versions, and apart from the apt-get commands the rest should be the almost the same for any unix operating system.

For those who doesn’t know what Jenkins is, here you are a very concise description:

Jenkins is an open source continuous integration tool written in Java.

It’s a very powerful tool and definitely a must have for any development environment that attempts to achieve continuous integration. If you are still not convinced have a look at wikipedia link, and read the fully explanation.

Ok, said that, let’s start, assuming the worst case scenario, where you have your own php application running, but you don’t even have a single automated/unit test.

The first thing you should probably do is install phpunit and start writting some tests. In my case I installed php-test-helpers as well. The installation should be pretty straightforward:

sudo pear config-set auto_discover 1
sudo pear install -a pear.phpunit.de/PHPUnit
sudo pear channel-discover pear.phpunit.de
sudo pecl install phpunit/test_helpers

I’ll skip the part when you learn how to write your own tests :).

Assuming you already have a bunch of tests that cover most of your application code, and you feel like you are ready to install jenkins, let’s do it:

wget -q -O - http://pkg.jenkins-ci.org/debian/jenkins-ci.org.key | sudo apt-key add -
sudo sh -c 'echo deb http://pkg.jenkins-ci.org/debian binary/ > /etc/apt/sources.list.d/jenkins.list'
sudo apt-get update
sudo apt-get install jenkins

Jenkins should be already installed and its service should  have started automatically. It has a web interface at 8080 port by default, so now you should be able to browse it by typing the following url in your browser: http://localhost:8080

Now you should install several plugins that will probably be useful later. You might eventually get rid of some of them, but I advice you to give them a try, as they can provide you with very interesting information about your application. Typically, it’s recommended to install the following plugins:

– Jenkins GIT plugin: As I use git and my application is stored in bitbucket.
– Checkstyle: Useful if you want to apply some codding standard checks over the code.
– Jenkins Clover PHP: For processing PHPUnit code coverage reports.
–  DRY: For checking for duplicated code in the application
– HTML Publisher: For publishing HTML reports, such as the ones generated by PHPUnit.
– JDepend: For analizing the code coupling and dependencies.
– Plot: For being able to draw/display results as graphs
– PMD: For performing further analysis of the code such as locating unused variables, unnecessary objects and many more.
– Violations: For gathering all the isues found by all the plugins listed above.
– xUnit: To collect PHPUnit test results.

Note that here we are just installing the plugins, but in many cases we will need to install some additional software to actually be able to generate the reports. The plugins will take care of reading and displaying in a friendly way the mentioned reports. But don’t worry, we’ll cross that bridge when we get to it, for the moment installing the plugins is enough.

Once you have installed all the plugins that you want, let’s create our first job and test that everything is working as expected.

Open Jenkins and click on New Job.
Type any job name, select “Build a free-style software project” and hit OK.
In the next window, select your Source Code Management (git in my case) and configure its settings.
Assuming you are using GIT, you have to enter the repository URL, and you’ll probably get an authentication error, unless you are using a public repository.
Note that jenkins runs as an isolated service and it even creates its own pseudo-username. The easiest way to authenticate on the git repository from jenkins is either creating a new public key for jenkins username or just using your own keys. This is how I managed to get it working:
sudo cp .ssh/* /var/lib/jenkins/.ssh/
sudo chown jenkins:nogroup /var/lib/jenkins/.ssh/
This should fix the problem.
Now you just need to select your branch and if you want to browse the code from jenkins, just select the repository browser and enter the url.

Then, under the build section, click on Add build step, select “Execute shell” and type in the following:

phpunit --log-junit 'reports/unitreport.xml' --coverage-html 'reports/coverage' --coverage-clover 'reports/coverage/coverage.xml' path_of_your_tests

Finally, under the post-build section, add the following actions:
– Publish JUnit test result report: Enter the previous used file path: tests/reports/unitreport.xml
– Publish Cover PHP Coverage report: Enter the coverage file path: tests/reports/coverage/coverage.xml
Enable the option Publish HTML Report and enter the path tests/reports/coverage/
– Publish xUnit test result report

Once you are finished doing that, save the job, and start a new build.
If everything works as expected, you should see a blue ball at the end, and and something like this:
blue ball buildIf you have any problems have a look at the Console Output and it should help you to detect the problem. Let me know if you have any other problem and are not able to fix it.

That’s all for now, next time we’ll show how to use all the installed plugins and how to automatically start new builds once new code is pushed to the repository.

Handling Arrays In PHP

arrays in phpToday, I’ll try to show you how to take advantage of the powerful built-in functions to handle arrays in PHP.

If you are a software developer, you will be used to handle data arrays in any language, so you’ll probably appreciate any given buil-in function that does all the dirty job for you.

This is not intended to be another tutorial of how to use arrays in PHP. I’d like to show you how combine the usage of the built-in functions, and how to create your own callable functions to create your custom filters, sorts or whatever you need.

If you have a look at php.net, PHP provide a lot of built-in functions, that allow you to transform, merge and/or split the data, accoding to your needs. Let’s start having a look at some of them individually, in order to see how to combine them later.


array_unique: Remove the duplicated elements from the array.

$a = array(0, 1, 1, 2);
array_unique($a); // array(0,1,2);

array_diff: Compare two arrays and return the difference respect from the first one.
$a = array(0, 1, 2);
$b = array(0, 2);
array_diff($a, $b); // array(1)

array_filter: Allow to apply a custom filter to the array. By default filters all elements that cast to boolean false.
$a = array(0 => true, 1 => false, 2 => 454, 3 => '');
array_filter($a); //array(0 => true, 2 => 454) Note that the key values are preserved

array_flip: Exchange the keys with the values.
$a = array("key" => "value");
array_flip($a); // array("value" => "key")

array_merge: Join all the arrays into a big one.
$a = array(1, 2, 3);
$b = array(3, 4, 5, 6);
array_merge($a, $b); //array(1, 2, 3, 3, 4, 5, 6) Note that duplicated elements are kept.

array_count_values: Show the number of ocurrences found for each value.
$a = array(1, 2, 1, 3, 2, 1);
array_count_values($a); //array(1 => 3, 2 => 2, 3 => 1);

array_map: Apply the given function to all the elements of the array. Check array_walk as well.
$a = array(" the", " developer ", " world ", " is ", " yours");
array_map("trim", $a); //array("the", "developer", "word", "is", "yours"); Note that the original array remains unchanged.

There are too many functions and I can’t show all of them here, because that would be another whole post. However you can always check the official page for more information and examples.

Now let’s see a couple of usages that can be useful many times.

For instance, imagine you are given an array of data, and first you want to do is clear it, and remove the empty values.

$data = array(" THE ", "developer ", "World", "", "IS", " yours ");
array_map('strtolower', array_filter(array_map('trim', $data)));
array (
  "the",
  "developer",
  "world",
  "is",
  "yours",
)

Firstly, trim all the elements, then filter the empty values, and finally transform all the data to lower case.
Another way of doing the same thing with an anonymous function:

array_filter(array_map(function($value) { return strtolower(trim($value));}, $data));

You might also try array_walk:

//Note that the following line will actually modify the original array
//We might use an anonymous function as well like in the previous example, now we use create_function
array_walk($data, create_function('&$val', '$val = strtolower(trim($val));'));
//Now we just need to clear out the empty values
array_filter($data);

Imagine we are working within a class, we want to apply a class method funcion to each array element, and we need to pass some extra variable as well.

class TheDeveloperWorldIsYours {
    public function parseData(&$data) {
        $prefix = 'Parsed: ';
        return array_walk($data, array($this, 'parseElement'), $prefix);
    }
    /**
     * parseElement
     * Prepend prefix to the given element if the key is even
     *
     * @param mixed $elem
     * @param mixed $key
     * @param mixed $prefix
     * @access protected
     * @return void
     */
    private function parseElement(&$elem, $key, $prefix) {
        $elem = $key % 2 == 0 ? $prefix . $elem : $elem;
    }
}

$data = array(1, 2, 3, 4, 5);
$a = new TheDeveloperWorldIsYours();
$a->parseData($data);
//Output:
array (
  0 => 'Parsed: 1',
  1 => 2,
  2 => 'Parsed: 3',
  3 => 4,
  4 => 'Parsed: 5',
)

Note that array_walk will call the given function with both the value and the key, in this order, and all the additional parameters will be placed afterwards.

Now, Imagine you have an array of data and you want to get only the duplicated elements.

$data = array('handling', 'arrays', 'in', 'php', 'handling', 'php', 'hi', 'php');
array_diff_key($data, array_unique($data));
array (
  "handling",
  "php",
  "php",
)

Note that we are comparing the array keys, because as we said previously, array_unique preserve them. So if we compare the original array with the unique, the repeated values (nor its keys) will not be present in the unique array, and we will get all the repeated elements as a result.
Imagine we just wanted to get once the repeated values. We might just apply an array_unique to the previous code, but we might do the following as well:

array_keys(array_filter(array_count_values($data), function($val) { return $val !=1;}));
array (
  'handling',
  'php',
)

And that’s about it. Now, each time you have to do some array filtering, sorting, transforming or whatever, firstly have a look at the php.net and think how you could do it without creating a loop :).

PHPUnit tips

Hopefully, all this phpunit tips can help you to test your own applications. If you have any questions, or suggestions, feel free to contact me!Some people have asked me to write something about testing, so I’ve decided to make this post and share some phpunit tips that I’ve learn.

This is not meant to be another tutorial of how to start writing your own tests with phpunit. Well, actually it is, but the purpose of my post is to explain you how to deal with real scenarios. If you try to search for phpunit examples, most places will show you how to create very simple test cases, but I haven’t seen many places explaining how to do more complex things. That’s exactly the purpose of this article. Moreover, If you think something’s missing, I encourage you to ask me about it, and I’ll try to append it to the post, or if it’s a big enough subject, I’ll create a whole new article about it.

Let’s begin from a very simple case, and we’ll get into something a bit more complicated step by step.
I assume that you already know what’s PHPUnit, and some basic concepts of how to use it, like for instance the dataproviders or mocks. If you think that you need some help to get started, have a look at PHPUnit’s official page and feel free to ask me, if needed.

Happy case in PHPUnit: Imagine you have an isolated class and you wanna test some of its methods.


/**
 * This code has been taken just as an example, from: http://snipplr.com/view.php?codeview&id=9024
 */
 class TheDeveloperWorldIsYours {
public static function generateUrlFromText($strText) {
 $strText = preg_replace('/[^A-Za-z0-9-]/', ' ', $strText);
 $strText = preg_replace('/ +/', ' ', $strText);
 $strText = trim($strText);
 $strText = str_replace(' ', '-', $strText);
 $strText = preg_replace('/-+/', '-', $strText);
 $strText = strtolower($strText);
 return $strText;
}
}

How would you test that? This might be an attempt:

/**
 * TheDeveloperWorldIsYours_Test
 *
 * @uses PHPUnit_Framework_TestCase
 */
class TheDeveloperWorldIsYours_Test extends PHPUnit_Framework_TestCase {

    public function setUp() {
    }

    protected static $inputs = array(
        array('   string    with  many   spaces   ', 'string-with-many-spaces'),
        array('stríng wïth wêird Àccents', 'str-ng-w-th-w-ird-ccents'),
        array('$peci4l: ch%r·ct3r$.', 'peci4l-ch-r-ct3r'),
        array('nice-string', 'nice-string'),
        array('testing+with+phpunit-is-cool', 'testing-with-phpunit-is-cool')
    );

    public static function providerFormat()
    {
        return self::$inputs;
    }

    /**
     * testFormat
     * Ensure that the string is being properly cleaned
     *
     * @dataProvider providerFormat
     * @param mixed $input
     * @param mixed $result
     * @access public
     * @return void
     */
    public function testFormat($input, $result) {
        $this->assertEquals($result, TheDeveloperWorldIsYours::generateUrlFromText($input));
    }
}

I assume that you have already set up some boostrap that takes care of including the required files (in this case, we just need to include the file where the class TheDeveloperWorldIsYours before executing the test).
If you run the test file with PHPUnit, you’ll notice that all the tests are passing, and you might want to add some corner cases to feel more comfortable, such as very long strings, empty strings, numbers and so. However, that’s out of the scope of this post (I could create a whole article about it). Nobody would know better than you the casuistic of your application, and you are the one responsible to add all the missing tests in each case.

So, in this example, we have the best scenario: The method is public, and even static, so we don’t need to instantiate the class in order to test the method. Also, the class doesn’t inherits from any other class, so we don’t have to worry about including all the required files.

Now, let’s assume we had the following scenario:

 class TheDeveloperWorldIsYours extends Some_Parent_Class {
protected function generateUrlFromText($strText) {
 $strText = preg_replace('/[^A-Za-z0-9-]/', ' ', $strText);
 $strText = preg_replace('/ +/', ' ', $strText);
 $strText = trim($strText);
 $strText = str_replace(' ', '-', $strText);
 $strText = preg_replace('/-+/', '-', $strText);
 $strText = strtolower($strText);
 if($strText === '') {
    die('Unable to generate a valid url from the given input');
 }
 return $strText;
}
}

Continue reading

CSV File Validation In PHP (Part II)

If you remember my previous post regarding CSV validation, I explained how to implement a different validator for each field, and phase, using the Strategy Design Pattern. For instance, in the lexical phase, I was checking for the correctness of the SKU field regarding the lenght of the string, and the absence of forbidden characters. In the semantic phase, depending on if it’s an update or an import task, I was checking for the existence or absence of the provided SKUs list in the database.

One of the big advantages of this implementation, is that it’s very easy to implement unitary tests for each validator, as they are almost totally independent from each other. I said almost because the return values of the lexical validators are the input of the semantic ones, so there is some kind of dependency there. For instance, the “store” field used to be a string containing the code of the store, but eventually, we allowed to put several comma-separated store codes on the field. This required changes not only in the regular expressions of the lexical validator, but also on any semantic validator that depended on the store tokens, as they’ll be expecting an input string, but instead, they’ll get an array.

I could keep on talking about the pros and cons of this approach, but it’s out of the scope of this post. Our goal is to implement a CSV file validation, so, let’s focus on the implementation of the main class which will take care or reading the CSV file, and execute the proper validator for each column.

For the shake of readability on the blog, I’ve un-indented one level the methods’ code. This is how the lexical phase looks like:

class Analyzer {

    const LEXICAL = 'lexical';
    const SEMANTIC = 'semantic';

    /**
     * Allowed profiles
     */
    private static $_PROFILES = array('import', 'update');

    private $_validators;
    private $_columnIndexes;
    private $_requiredFields;
    private $_profile;
    private $_errors;

function __construct(array $required, $profile) {
    $this->_validators = array();
    $this->_columnIndexes = array();
    $this->_errors = array();
    $this->_requiredFields = $required;
    $this->_profile = $profile;
    if(!in_array($profile, self::$_PROFILES)) {
        die('Unknown profile specified.');
    }
}
/*
 * Perform the lexical analysis over the CSV
 * Iterate over the file and exit if an error is found
 *
 * @param $file string full path of the csv file
 * @return bool whether the file is valid (from a lexical point of view) or not
 */
protected function lexical($file) {
    if (!file_exists($file))  {
        return false;
    } else {
        $delimiter = self::getDelimiter($file);
    }

    $handle = fopen($file, 'r');

    //Parse the first row, instantiate all the validators
    $valid = $this->parseFirstRow(fgetcsv($handle, 0, $delimiter));
    //Number of columns specified on the header
    $num_columns = sizeOf($this->_columnIndexes);
    //line number count
    $i = 1;

    while(($data = fgetcsv($handle, 0, $delimiter)) !== FALSE && $valid) {
        $errors = array();

        //For each column
        foreach ($data as $key => $value) {

            //Skip all columns without header
            if($key >= $num_columns) {
                break;
            }

            $value = trim($value);

            //Validate
            $errors[$this->_columnIndexes[$key]] = $this->_validators[$key]->validate($value);

            $valid = $valid && $errors[$this->_columnIndexes[$key]];
        }

        //If any error was found, exit
        if(!$valid) {
            $filtered_errors = array_keys($errors, false);
            if(count($filtered_errors) > 0) {
		//Store the errors founds on the current line
                $this->_errors[$i] = $filtered_errors;
            }
            break;
        }
        $i++;
    }
    fclose($handle);

    return $valid;
}
/**
 * parseFirstRow
 * Check that the column names aren't duplicated
 * Ensure all required fields are present
 * Create the instances of each validator.
 *
 * @param array $data
 * @access protected
 * @return bool
 */
protected function parseFirstRow(array $data) {
    $valid = true;
    //Clean the data
    $data = array_filter(array_map('trim', array_map('strtolower', $data)));

    //Ensure that there aren't duplicated columns
    $dupes = array_diff_key($data, array_unique($data));
    if(!empty($dupes)) {
        $this->_errors[] = sprintf('The following columns are duplicated on the CSV: "%s".', implode($dupes, '", "'));
        $valid = false;
    }

    //Ensure all required columns are present
    if($valid &&
        //The number of columns is lower than the required fields, we don't need to keep checking, some columns are missing.
        (count($data) < count($this->_requiredFields) ||
        //The number of optional fields must match with the number of fields that are not required, otherwise something is missing.
        count(array_diff($data, $this->_requiredFields)) !== (count($data) - count($this->_requiredFields)) ||
        //If the operation is an import, either categories or category_ids must be present
        ($this->_profile == 'import' && !(in_array('categories', $data) || in_array('category_ids', $data))))) {

            $required = implode(array_diff($this->_requiredFields, $data), '", "');
            if($this->_profile == 'import' && !in_array('category_ids', $data) && !in_array('categories', $data)) {
                if($required) {
                    $required .= '" and "categories" or "category_ids';
                } else {
                    $required = 'categories" or "category_ids';
                }
            }
            $this->_errors[] = sprintf('The following columns are missing on the CSV: "%s".', $required);
            $valid = false;
        }

    if($valid) {
        //Instantiate all the lexical validators
        foreach ($data as $key => $value) {
            $this->_validators[$key] = new ValidatorContext(Analyzer::LEXICAL, $value, $this->_profile);
            $this->_columnIndexes[$key] = $value;
        }
    }
    return $valid;
}
/**
 * getDelimiter
 * Try to detect the delimiter character on a CSV file, by reading the first row.
 *
 * @param mixed $file
 * @access public
 * @return string
 */
public static function getDelimiter($file) {
    $delimiter = false;
    $line = '';
    if($f = fopen($file, 'r')) {
        $line = fgets($f); // read until first newline
        fclose($f);
    }
    if(strpos($line, ';') !== FALSE && strpos($line, ',') === FALSE) {
        $delimiter = ';';
    } else if(strpos($line, ',') !== FALSE && strpos($line, ';') === FALSE) {
        $delimiter = ',';
    } else {
        die('Unable to find the CSV delimiter character. Make sure you use "," or ";" as delimiter and try again.');
    }
    return $delimiter;
}
}

I hope that the code is pretty much self-explanatory, but anyway I’ll summarize the process.

It all begins with the instantiation of the Analyzer class, where you provide the profile, and the required fields for this task.
Then, the lexical method is called with the full path of the CSV file to be processed. If the file exists, we firstly try to detect the delimiter used, which should be a comma or a semicolon.

Afterwards, the first row is parsed by cleaning the data, and checking for the correctness of the fields, ensuring there are not duplicated fields, and that all the required fields are present. If everything went as expected, we call the ValidatorContext class providing the fieldname, and we’ll get the proper validator instatiated.

Finally, we iterate over the CSV, ensuring that all the validations are passed, stopping the process in case of error, and saving the field and the line where the problem was found.

That’s all from now, in the next post I’ll show the last piece of the puzzle, which is the implementation of the ValidatorContext class, where we will be able to unify the whole process.

CSV File Validation In PHP (Part I)

Last week, I had to implement a CSV file validation in PHP. It’s a potentially very big file with several columns, where each one has its custom restrictions. For example, the “SKU” field must be an alphanumeric string no longer than 64 characters, the “IMAGES” field must contain image filename(s) and/or image URL(s) (comma separated values), and so on.

Eventually, I came up with a solution, which consists on using a strategy pattern, implementing a custom validation strategy for each field.

In order to fully ensure that the input data is correct, two kinds of validations must be performed:

  • Restrictions regarding the format, like allowed characters or maximun length.
  • Not only the data must be in the proper format, but also it must be valid from a semantical point of view. For instance, the SKUs must exist on the database, or each image URL specified must actually be an online image (an html, or an offline server shouldn’t pass the validation).

So, I split the requirements in two kind of validators: lexical and semantic.

This is how it looks like, hopefully it can be useful if you are in a similar situation:

Base validator:


abstract class Abstract_Validator {

/*
 * Tell whether the data is correct or not
 * Might change (correct) the $input value
 *
 * @return bool
 */
 abstract function validate(&$input);
 /*
 * Return a string with the details of the error and/or the allowed values for the current field.
 *
 * @param $input string|array
 * @return string
 */
 abstract function getErrorMsg($input);

 }

Base lexical validator with an implementation example:

abstract class Abstract_Lexical extends Abstract_Validator {

 protected $tokens;

 function __construct() {
 $this->tokens = array();
 }

 public function getTokens() {
 return $this->tokens;
 }
 public function validate(&$input) {
 $this->tokens[] = $input;
 }

/*
* Return the last part of the error message
* Can be called from a child class to complete the returned message.
*
* @param $input string
* @return string
*/
 public function getErrorMsg($input) {
   return sprintf('but "%s" provided.', $input);
 }
}

/*
 * EAN is an optional field, but if it's not empty, it must be a valid code.
 */
class Lexical_Ean extends Abstract_Lexical {
 protected function checkEAN($fullcode) {
 $code = substr($fullcode, 0, -1);
 $checksum = 0;
 foreach (str_split(strrev($code)) as $pos => $val) {
 $checksum += $val * (3 - 2 * ($pos % 2));
 }
 return (10 - ($checksum % 10)) % 10 == substr($fullcode,-1);
 }
 public function validate(&$input) {
 parent::validate(&$input);
 return strlen($input) == 0 || $this->checkEAN($input);
 }

 public function getErrorMsg($input) {
 return 'allowed values: valid EAN code (13 digits number), ' . parent::getErrorMsg($input);
 }
 }

Base Semantic validator, with an implementation example:

abstract class Abstract_Semantic extends Abstract_Validator {
 protected $dbLink;
 protected $errors;
 protected static $FIELD;
 protected static $QUERY;
 protected static $TYPE = 'validateExisting';

 function __construct($dbLink = null) {
 $errors = array();
 if($dbLink) {
 $this->dbLink = $dbLink;
 }
 }

 /*
 * Default validator
 * Ensure the extended class has defined the required values
 * Perform the query and pass the result to the proper method
 *
 * @param $input simple array with data
 * @return bool
 */
 public function validate(&$input) {

 if (!$this->dbLink || !static::$QUERY || !static::$FIELD || !static::$TYPE) {
 $this->errors[] = 'Error validating ' . get_class($this) . ', incomplete data was provided.';
 return false;
 }
 $field = static::$FIELD;
 $type = static::$TYPE;
 $list = $this->dbLink->query(sprintf(static::$QUERY, implode($input, '","')));
 return $this->$type($list, $field, $input);
 }

 /*
 * Ensure all requested items where present in the database
 * Otherwise, log all not found items
 *
 * @param list mysqli_result
 * @param $field field to check
 * @param $input array with data
 * @return bool
 */
 protected function validateExisting($list, $field, $input) {
 $ret = $list->num_rows == count($input);
 if(!$ret) {
 $found = array();
 while($obj = $list->fetch_object()){
 $found[] = $obj->$field;
 }
 $this->errors = array_diff($input, $found);
 }
 return $ret;
 }

 /*
 * Return the errors found during the validation
 * @return array
 */
 public function getErrors() {
 return $this->errors;
 }

 /**
 * Return the description message for the error.
 *
 * @param $input array
 * @return string
 */
 public function getErrorMsg($input) {
 if (static::$TYPE == 'validateExisting') {
 $exist = 'must exist';
 } else $exist = 'cannot exist';

 return sprintf('%s(s) "%s" %s on the database.', str_replace('Semantic_', '', get_class($this)), implode($input, ', '), $exist);
 }
 }

/**
 * Store field must correspond to an existing store in the database.
 */
class Semantic_Store extends Abstract_Semantic {
 protected static $QUERY = 'SELECT code from stores where code in("%s")';
 protected static $FIELD = 'code';
 }

That’s all for the moment, on next post I’ll explain how I did join all this, iterating over the CSV file, stay tuned!