Categories
PHP

Coding conventions: PHP

Code structure

Assignment expressions

Using assignment as an expression is surprising to the reader and looks like an error. Do not write code like this:

if ( $a = foo() ) {
    bar();
}

Space is cheap, and you’re a fast typist, so instead use:

$a = foo();
if ( $a ) {
    bar();
}

Using assignment in a while() clause used to be legitimate, for iteration:

$res = $dbr->query( 'SELECT * FROM some_table' );
while ( $row = $dbr->fetchObject( $res ) ) {
    showRow( $row );
}

This is unnecessary in new code; instead use:

$res = $dbr->query( 'SELECT * FROM some_table' );
foreach ( $res as $row ) {
    showRow( $row );
}

Spaces

MediaWiki favors a heavily-spaced style for optimum readability.

Put spaces on either side of binary operators, for example:

// No:
$a=$b+$c;

// Yes:
$a = $b + $c;

Put spaces next to parentheses on the inside, except where the parentheses are empty. Do not put a space following a function name.

$a = getFoo( $b );
$c = getBar();

Opinions differ as to whether control structures  if, while, for, foreach etc. should be followed by a space; the following two styles are acceptable:

// Spacey
if ( isFoo() ) {
        $a = 'foo';
}

// Not so spacey
if( isFoo() ) {
        $a = 'foo';
}

In comments there should be one space between the # or // character and the comment, and a comment should be put on its own line.

// No:
        public static function getFoo( $bar ) {
                if ( $bar !== false ) { //because this and that..
                        return $bar; //already defined, return it
                }
        }

// Yes:
        public static function getFoo( $bar ) {
                // Because this and that..
                if ( $bar !== false ) {
                        // Already defined, return it.
                        return $bar;
                }
        }

To help developers fix code with an inadequately spacey style, a tool called stylize.php has been created, which uses PHP’s tokenizer extension to enforce most whitespace conventions automatically.

Ternary operator

The ternary operator can be used profitably if the expressions are very short and obvious:

$swat = isset( $this->mParams['swat'] ) ? $this->mParams['swat'] : false;

But if you’re considering a multi-line expression with a ternary operator, please consider using an if() block instead. Remember, disk space is cheap, code readability is everything, “if” is English and ?: is not.

PHP-v5.3 shorthand

Since we still support PHP 5.2.x, use of the shorthand ternary operator (?:) introduced in PHP 5.3 is not allowed.

String literals

For simple string literals, single quotes are slightly faster for PHP to parse than double quotes. Perhaps more importantly, they are easier to type, since you don’t have to press shift. For these reasons, single quotes are preferred in cases where they are equivalent to double quotes.

However, do not be afraid of using PHP’s double-quoted string interpolation feature: $elementId = “myextension-$index”; This has slightly better performance characteristics than the equivalent using the concatenation (dot) operator, and it looks nicer too.

Heredoc-style strings are sometimes useful:

$s = <<<EOT
<div class="mw-some-class">
$boxContents
</div>
EOT;

Some authors like to use END as the ending token, which is also the name of a PHP function. This leads to IRC conversations like the following:

<Simetrical>      vim also has ridiculously good syntax highlighting.
<TimStarling>     it breaks when you write <<<END in PHP
<Simetrical>      TimStarling, but if you write <<<HTML it syntax-highlights as HTML!
<TimStarling>     I have to keep changing it to ENDS so it looks like a string again
<brion-codereview>        fix the bug in vim then!
<TimStarling>     brion-codereview: have you ever edited a vim syntax script file?
<brion-codereview>        hehehe
<TimStarling>     http://tstarling.com/stuff/php.vim
<TimStarling>     that's half of it...
<TimStarling>     here's the other half: http://tstarling.com/stuff/php-syntax.vim
<TimStarling>     1300 lines of sparsely-commented code in a vim-specific language
<TimStarling>     which turns out to depend for its operation on all kinds of subtle inter-pass effects
<werdnum> TimStarling: it looks like some franken-basic language.

Functions and parameters

Avoid passing huge numbers of parameters to functions or constructors:

//Constructor for Block.php as of 1.17. *DON'T* do this!
function __construct( $address = '', $user = 0, $by = 0, $reason = '',
        $timestamp = 0, $auto = 0, $expiry = '', $anonOnly = 0, $createAccount = 0, $enableAutoblock = 0,
        $hideName = 0, $blockEmail = 0, $allowUsertalk = 0 )
{
        ...
}

It quickly becomes impossible to remember the order of parameters, and you will inevitably end up having to hardcode all the defaults in callers just to customise a parameter at the end of the list. If you are tempted to code a function like this, consider passing an associative array of named parameters instead.

In general, using boolean parameters is discouraged in functions. In $object->getSomething( $input, true, true, false ), without looking up the documentation for MyClass::getSomething(), it is impossible to know what those parameters are meant to indicate. Much better is to either use class constants, and make a generic flag parameter:

$myResult = MyClass::getSomething( $input, MyClass::FROM_DB & MyClass::PUBLIC_ONLY );

Or to make your function accept an array of named parameters:

$myResult = MyClass::getSomething( $input, array( 'fromDB', 'publicOnly' ) );

Try not to repurpose variables over the course of a function, and avoid modifying the parameters passed to a function (unless they’re passed by reference and that’s the whole point of the function, obviously).

C borrowings

The PHP language was designed by people who love C and wanted to bring souvenirs from that language into PHP. But PHP has some important differences from C.

In C, constants are implemented as preprocessor macros and are fast. In PHP, they are implemented by doing a runtime hashtable lookup for the constant name, and are slower than just using a string literal. In most places where you would use an enum or enum-like set of macros in C, you can use string literals in PHP.

PHP has three special literals: true, false and null. Homesick C developers write null as NULL because they want to believe that it is a macro defined as ((void*)0). This is not necessary.

Use elseif not else if. They have subtly different meanings:

// This:
if( $foo == 'bar' ) {
        echo 'Hello world';
} else if( $foo == 'Bar' ) {
        echo 'Hello world';
} else if( $baz == $foo ) {
        echo 'Hello baz';
} else {
        echo 'Eh?';
}

// Is actually equivalent to:
if( $foo == 'bar' ) {
        echo 'Hello world';
} else {
        if( $foo == 'Bar' ) {
                echo 'Hello world';
        } else  {
                if( $baz == $foo ) {
                        echo 'Hello baz';
                } else {
                        echo 'Eh?';
                }
        }
}

And the latter has poorer performance.

Naming

Use lowerCamelCase when naming functions or variables. For example:

private function doSomething( $userPrefs, $editSummary )

Use UpperCamelCase when naming classes: class ImportantClass. Use uppercase with underscores for global and class constants: DB_MASTER, Revision::REV_DELETED_TEXT. Other variables are usually lowercase or lowerCamelCase; avoid using underscores in variable names.

There are also some prefixes used in different places:

Functions

  • wf (wiki functions) – top-level functions, e.g.
function wfFuncname() { ... }

Verb phrases are preferred: use getReturnText() instead of returnText().

Variables

  • $wg – global variables, e.g. $wgVersion, $wgTitle. Always use this for new globals, so that it’s easy to spot missing “global $wgFoo” declarations. In extensions, the extension name should be used as a namespace delimiter. For example, $wgAbuseFilterConditionLimit, not $wgConditionLimit.

It is common to work with an instance of the Database class; we have a naming convention for these which helps keep track of the nature of the server to which we are connected. This is of particular importance in replicated environments, such as Wikimedia and other large wikis; in development environments there is usually no difference between the two types, which can conceal subtle errors.

  • $dbw – a Database object for writing (a master connection)
  • $dbr – a Database object for non-concurrency-sensitive reading (this may be a read-only slave, slightly behind master state, so don’t ever try to write to the database with it, or get an “authoritative” answer to important queries like permissions and block status)

The following may be seen in old code but are discouraged in new code:

  • $ws – Session variables, e.g. $_SESSION[‘wsSessionName’]
  • $wc – Cookie variables, e.g. $_COOKIE[‘wcCookieName’]
  • $wp – Post variables (submitted via form fields), e.g. $wgRequest->getText( ‘wpLoginName’ )
  • $m – object member variables: $this->mPage. This is discouraged in new code, but try to stay consistent within a class.

Pitfalls

  • Understand and read the documentation for isset() and empty(). Use them only when appropriate.
    • empty() is inverted conversion to boolean with error suppression. Only use it when you really want to suppress errors. Otherwise just use !. Do not use it to test if an array is empty, unless you simultaneously want to check if the variable is unset.
    • Do not use isset() to test for null. Using isset() in this situation could introduce errors by hiding mis-spelled variable names. Instead, use $var === null
  • Study the rules for conversion to boolean. Be careful when converting strings to boolean.
  • Be careful with double-equals comparison operators. Triple-equals is often more intuitive.
    • ‘foo’ == 0 is true
    • ‘000’ == ‘0’ is true
    • ‘000’ === ‘0’ is false
  • Array plus does not renumber the keys of numerically-indexed arrays, so array(‘a’) + array(‘b’) === array(‘a’). If you want keys to be renumbered, use array_merge(): array_merge( array( ‘a’ ), array( ‘b’ ) ) == array( ‘a’, ‘b’ )
  • Make sure you have error_reporting set to E_ALL for PHP 5. This will notify you of undefined variables and other subtle gotchas that stock PHP will ignore. See also Manual:How to debug.
  • When working in a pure PHP environment, remove any trailing ?> tags. These tags often cause issues with trailing white-space and “headers already sent” error messages (cf. bugzilla:17642 and http://news.php.net/php.general/280796).
  • Do not use the ‘goto’ syntax introduced in 5.3. PHP may have introduced the feature, but that does not mean we should use it.

Comments and Documentation

The Doxygen documentation style is used (it is very similar to PHPDoc for the subset that we use). A code documentation example: giving a description of a function or method, the parameters it takes (using @param), and what the function returns (using @return), or the @ingroup or @author tags.

Use @ rather than \ as the escape character (i.e. use @param rather than \param) – both styles work in Doxygen, but for backwards and future compatibility MediaWiki uses has chosen the @param style as convention).

Use /** to begin the comments, instead of the Qt-style formatting /*!.

General format for parameters is such: @param type $varname: description. Multiple types can be listed by separating with a pipe character.

Doxygen documentation states that @param should have the same format as phpDocumentor:

@param  datatype1|datatype2 $paramname description

For every public interface (method, class, variable, whatever) you add or change, a @since tag should be provided, so people extending the code via this interface know they are breaking compatibility with older versions of the code.

class Foo {

        /**
         * @var array $bar: Description here
         * @example array( 'foo' => Bar, 'quux' => Bar, .. )
         */
        protected $bar;

        /**
         * Short decription here, following by documentation of the parameters.
         *
         * @since 1.42
         *
         * @param FooContext $context
         * @param array|string $options: Optionally pass extra options. Either a string or an array of strings.
         * @return Foo|null: New instance of Foo or null of quuxification failed.
         *
         * Some example:
         * @code
         * ...
         * @endcode
         */
        public function makeQuuxificatedFoo( FooContext $context = null, $options = array() ) {
                /* .. */
        }

}

PHPDoc was used at the very beginning but got replaced with Doxygen for performance reason. We should probably drop PHPDoc compatibility.

@var: documenting class members

There is a ‘bug’ in Doxygen which affects MediaWiki’s documentation: using @var to specify the class members’ type only works if the variable name is appended:

       /**
         * Some explanation about the variable
         *
         * @var string $msg
         */
        protected $msg;

If you don’t append the variable name Doxygen will ignore the entire comment block and it will not be included in the docs.

Integration

There are a few pieces of code in the MediaWiki codebase which are intended to be standalone and easily portable to other applications; examples include the UTF normalisation in /includes/normal and the libraries in /includes/libs. Apart from these, code should be integrated into the rest of the MediaWiki environment, and should allow other areas of the codebase to integrate with it in return.

Global objects

Do not access the PHP superglobals $_GET, $_POST, etc, directly; use $request->get*( ‘param’ ) instead; there are various functions depending on what type of value you want. You can get a WebRequest from the nearest RequestContext, or if absolutely necessary $wgRequest. Equally, do not access $_SERVER directly; use $request->getIP() if you want to get the IP address of the current user.

Static methods and properties

Static methods and properties are useful for programmers because they act like globals without polluting the global namespace. However, they make subclassing and reuse more difficult for other developers. Generally, you should avoid introducing static functions and properties when you can, especially if the sole purpose is to just save typing.

For example, lots of developers would prefer to write something like:

Foo::bar();

This is because it is shorter and takes less keystrokes. However, by doing this you’ve made the Foo class much harder to subclass and reuse. Instead of introducing a static method, you could just type:

$f = new Foo();
$f->bar();

Remember, shorter does not always mean better, and you should take the time to design your classes in a way that makes them easy to reuse.

Late static binding

In PHP 5.3, a new feature called “Late Static Binding” (LSB) was added to help work around this perceived lack of functionality in static functions. However, the usefulness of LSB is debatable among MediaWiki developers and should be avoided for the time being.

Classes

Encapsulate your code in an object-oriented class, or add functionality to existing classes; do not add new global functions or variables. Try to be mindful of the distinction between ‘backend’ classes, which represent entities in the database (eg User, Block, Revision, etc), and ‘frontend’ classes, which represent pages or interfaces visible to the user (SpecialPage, Article, ChangesList, etc. Even if your code is not obviously object-oriented, you can put it in a static class (eg IP or Html).

As a holdover from PHP 4’s lack of private class members and methods, older code will be marked with comments such as /** @private */ to indicate the intention; respect this as if it were enforced by the interpreter.

Mark new code with proper visibility modifiers, including public if appropriate, but do not add visibility to existing code without first checking, testing and refactoring as required. It’s generally a good idea to avoid visibility changes unless you’re making changes to the function which would break old uses of it anyway.

Error handling

Don’t suppress errors with PHP’s @ operator, for any reason ever. It’s broken when E_STRICT is enabled and it causes an unlogged, unexplained error if there is a fatal, which is hard to support. Use wfSuppressWarnings() and wfRestoreWarnings() instead. The checkSyntax.php maintenance script can check for this error for you.

When your code encounters a sudden error, you should throw a MWException (or an appropriate subclass) rather than using PHP’s trigger_error. The exception handler will display this as nicely as possible to the end user and wiki administrator, and also provides a stack trace to developers.

'Coz sharing is caring
Categories
PHP

9 Magic Methods for PHP

The “magic” methods are ones with special names, starting with two underscores, which denote methods which will be triggered in response to particular PHP events.

That might sound slightly automagical but actually it’s pretty straightforward, we already saw an example of this in the last post, where we used a constructor – so we’ll use this as our first example.

__construct

The constructor is a magic method that gets called when the object is instantiated. It is usually the first thing in the class declaration but it does not need to be, it is a method like any other and can be declared anywhere in the class.

Constructors also inherit like any other method. So if we consider our previous inheritance example from the Introduction to OOP, we could add a constructor to the Animal class like this:

class Animal {

  public function __construct() {
    $this->created = time();
    $this->logfile_handle = fopen('/tmp/log.txt', 'w');
  }

}

animal.php

Now we can create a class which inherits from the Animal class – a Penguin! Without adding anything into the Penguin class, we can declare it and have it inherit from Animal, like this:

class Penguin extends Animal {

}

$tux = new Penguin;
echo $tux->created;

If we define a __construct method in the Penguin class, then Penguin objects will run that instead when they are instantiated. Since there isn’t one, PHP looks to the parent class definition for information and uses that. So we can override, or not, in our new class – very handy.

__destruct

Did you spot the file handle that was also part of the constructor? We don’t really want to leave things like that lying around when we finish using an object and so the __destruct method does the opposite of the constructor. It gets run when the object is destroyed, either expressly by us or when we’re not using it any more and PHP cleans it up for us. For the Animal, our __destruct method might look something like this:

class Animal{

  public function __construct() {
    $this->created = time();
    $this->logfile_handle = fopen('/tmp/log.txt', 'w');
  }

  public function __destruct() {
    fclose($this->logfile_handle);
  }
}

animal2.php

The destructor lets us close up any external resources that were being used by the object. In PHP since we have such short running scripts (and look out for greatly improved garbage collection in newer versions), often issues such as memory leaks aren’t a problem. However it’s good practice to clean up and will give you a more efficient application overall!

__get

This next magic method is a very neat little trick to use – it makes properties which actually don’t exist appear as if they do. Let’s take our little penguin:

class Penguin extends Animal {

  public function __construct($id) {
    $this->getPenguinFromDb($id);
  }

  public function getPenguinFromDb($id) {
    // elegant and robust database code goes here
  }
}

penguin1.php

Now if our penguin has the properties “name” and “age” after it is loaded, we’d be able to do:

$tux = new Penguin(3);
echo $tux->name . " is " . $tux->age . " years old\n";

However imagine something changed about the backend database or information provider, so instead of “name”, the property was called “username”. And imagine this is a complex application which refers to the “name” property in too many places for us to change. We can use the __get method to pretend that the “name” property still exists:

class Penguin extends Animal {

  public function __construct($id) {
    $this->getPenguinFromDb($id);
  }

  public function getPenguinFromDb($id) {
    // elegant and robust database code goes here
  }

  public function __get($field) {
    if($field == 'name') {
      return $this->username;
    }
}

penguin2.php

This technique isn’t really a good way to write whole systems, because it makes code hard to debug, but it is a very valuable tool. It can also be used to only load properties on demand or show calculated fields as properties, and a hundred other applications that I haven’t even thought of!

__set

So we updated all the calls to $this->name to return $this->username but what about when we want to set that value, perhaps we have an account screen where users can change their name? Help is at hand in the form of the __set method, and easiest to illustrate with an example.

class Penguin extends Animal {

  public function __construct($id) {
    $this->getPenguinFromDb($id);
  }

  public function getPenguinFromDb($id) {
    // elegant and robust database code goes here
  }

  public function __get($field) {
    if($field == 'name') {
      return $this->username;
    }
  }

  public function __set($field, $value) {
    if($field == 'name') {
      $this->username = $value;
    }
  }
}

penguin3.php

In this way we can falsify properties of objects, for any one of a number of uses. As I said, not a way to build a whole system, but a very useful trick to know.

__call

There are actually two methods which are similar enough that they don’t get their own title in this post! The first is the __call method, which gets called, if defined, when an undefined method is called on this object.

The second is __callStatic which behaves in exactly the same way but responds to undefined static method calls instead (only in new versions though, this was added in PHP 5.3).

Probably the most common thing I use __call for is polite error handling, and this is especially useful in library code where other people might need to be integrating with your methods.

So for example if a script had a Penguin object called $penguin and it contained $penguin->speak() ... the speak() method isn’t defined so under normal circumstances we’d see:

PHP Fatal error: Call to undefined method Penguin::speak() in …

What we can do is add something to cope more nicely with this kind of failure than the PHP fatal error you see here, by declaring a method __call. For example:

class Animal {
}
class Penguin extends Animal {

  public function __construct($id) {
    $this->getPenguinFromDb($id);
  }

  public function getPenguinFromDb($id) {
    // elegant and robust database code goes here
  }

  public function __get($field) {
    if($field == 'name') {
      return $this->username;
    }
  }

  public function __set($field, $value) {
    if($field == 'name') {
      $this->username = $value;
    }
  }

  public function __call($method, $args) {
      echo "unknown method " . $method;
      return false;
  }
}

penguin4.php

This will catch the error and echo it. In a practical application it might be more appropriate to log a message, redirect a user, or throw an exception, depending on what you are working on – but the concept is the same.

Any misdirected method calls can be handled here however you need to, you can detect the name of the method and respond differently accordingly – for example you could handle method renaming in a similar way to how we handled the property renaming above.

__sleep

The __sleep() method is called when the object is serialised, and allows you to control what gets serialised. There are all sorts of applications for this, a good example is if an object contains some kind of pointer, for example a file handle or a reference to another object.

When the object is serialised and then unserialised then these types of references are useless since the target may no longer be present or valid. Therefore it is better to unset these before you store them.

__wakeup

This is the opposite of the __sleep() method and allows you to alter the behaviour of the unserialisation of the object. Used in tandem with __sleep(), this can be used to reinstate handles and object references which were removed when the object was serialised.

A good example application could be a database handle which gets unset when the item is serialised, and then reinstated by referring to the current configuration settings when the item is unserialised.

__clone

We looked at an example of using the clone keyword in the second part of my introduction to OOP in PHP, to make a copy of an object rather than have two variables pointing to the same actual data. By overriding this method in a class, we can affect what happens when the clone keyword is used on this object.

While this isn’t something we come across every day, a nice use case is to create a true singleton by adding a private access modifier to the method.

__toString

Definitely saving the best until last, the __toString method is a very handy addition to our toolkit. This method can be declared to override the behaviour of an object which is output as a string, for example when it is echoed.

For example if you wanted to just be able to echo an object in a template, you can use this method to control what that output would look like. Let’s look at our Penguin again:

class Penguin {

  public function __construct($name) {
      $this->species = 'Penguin';
      $this->name = $name;
  }

  public function __toString() {
      return $this->name . " (" . $this->species . ")\n";
  }
}

penguin5.php

With this in place, we can literally output the object by calling echo on it, like this:

$tux = new Penguin('tux');
echo $tux;

I don’t use this shortcut often but it’s useful to know that it is there.

More Magic Methods

There is a great reference on the php.net site itself, listing all the available magic methods (yes, there are more than these, I just picked the ones I actually use) so if you want to know what else is available then take the time to check this out.

Hopefully this has been a useful introduction to the main ones, leave a comment to let us know how you use these in your own projects!

'Coz sharing is caring