Thursday, May 27, 2010

Readable PHP code #2 Make your API handle more!

Context

APIs are often designed to operate with scalars. As described in the following example, the function addContact() operates on a single element:

<?php
class User {
    protected 
$contacts = array();

    function 
addContact($contact) {
        
$this->contacts[] = $contact;
    }
}
?>

In a context where many contacts have to be added using the above API, code usually looks like:

<?php
// Some contacts
$contacts = array("Paul""John""Maria");

$user = new User();
// Looping over contacts to add them
foreach ($contacts as $contact) {
    
$user->addContact($contact);
}
?>

Inner looping

A way to avoid repeating this looping everywhere would be to design the API to work with array of contacts:

<?php
class User {
    protected 
$contacts = array();

    
function addContacts($contacts) {
        
foreach ($contacts as $contact) {
            
$this->contacts[] = $contact;
        }
    }

}
?>

This might make the code, where loops are in use, somewhat clearer:

<?php
// Some contacts
$contacts = array("Paul""John""Maria");

$user = new User();
$user->addContacts($contacts);
?>

With the benefit of speeding the processing a little bit as only one function call is issued!

The side effect is that adding only one contact is not as elegant:

<?php
$contact "Julia";

$user = new User();
$user->addContacts(array($contact));
?>

A nice PHP trick

The good news is that PHP provides a nice array cast operator: (array) which will transform a scalar value into an array. As described on the manual page about arrays:

"For any of the types: integer, float, string, boolean and resource, converting a value to an array results in an array with a single element with index zero and the value of the scalar which was converted. In other words, (array)$scalarValue is exactly the same as array($scalarValue)."

Previous example can be transformed to play nicely with both scalars and arrays:

<?php
class User {
    protected 
$contacts = array();

    function 
addContacts($contacts) {
        foreach (
(array) $contacts as $contact) {
            
$this->contacts[] = $contact;
        }
    }
}
?>

Function addContacts() can now be used the following way with scalars:

<?php
$contact 
"Julia";

$user = new User();
$user->addContacts($contact);
?>

OOPs!

This is a nice trick to define APIs to be used with both scalars and arrays, however this will not work when using objects! PHP is able to cast an objects as an array, but this will give you an access to its properties which is not the intended purpose.

If your contacts are objects you will have to modify the addContacts() function to something like:

<?php
class User {
    protected 
$contacts = array();

    function 
addContacts($contacts) {
        
if (is_array($contacts)) {
            foreach (
$contacts as $contact) {
                
$this->contacts[] = $contact;
            }
        } else {
            
$this->contacts[] = $contacts;
        }
    
}
}
?>

This may not be as elegant as the array casting method. However it will enable your API to work seemlessly with both scalars and arrays when objects are used.

The benefit of working with an array capable API is that you might sometimes optimize the operations. For example, in the case of retrieving or deleting elements from a database, you might want to use the "WHERE id IN (...)" syntax to match multiple elements rather than one by one. In the X examples that has been used in this article, an interesting optimization is to use the native array_merge() function which avoids reinventing the wheel by looping over elements using a foreach construct and adding elements one by one:

<?php
class User {
    protected 
$contacts = array();

    function 
addContacts($contacts) {
        if (
is_array($contacts)) {
            
$this->contacts array_merge($this->contacts$contacts);
        
} else {
            
$this->contacts[] = $contacts;
        }
    }
}
?>

Conclusion

In this article we have seen the advantages of creating an API which can handle multiple elements at once. If there is a benefit in terms of speed (which heavely depends on your business logic), don't forget that code readability is of higher importance too! Hopefully, this tip should improve both

For those who mind about performance, doing:

<?php
// Adding 5.000.000 contacts
$user->addContacts(range(15e6));
?>

is about 40% faster than:

<?php
foreach (range(15e6) as $contact) {
    
$user->addContact($contact);
}
?>

Thanks to Paul Dragoonis, Paul Borgermans and Jérôme Renard for reviewing this article

3 comments:

Artem Nezvigin said...

I disagree. Strongly.

If your addContact() method needs more parameters, how do you handle that? Example:

public function addContact($contact, $sendWelcomeEmail=false);

Now you need to pass the boolean into each element of your array. Ugly.

What if your method uses type hinting to accept a data access object?

public function addContact(UsersTable $user, $sendWelcomeEmail=false);

In the simplest of scenarios, maybe, your recommendation holds water. In the real world it breaks semantics, makes things harder to understand and unnecessarily makes the code more obscure and unreadable.

If you want to support an array entries, just add another method like this:

addMultipleContacts(array $contacts, $sendWelcomeEmail=false);

Just because it's less code doesn't make it better. Sometimes verbosity and being explicit is a good thing. There's absolutely nothing wrong with looping and adding a contact on each iteration. Nothing at all.

Patrick Allaert said...

@Artem,

Don't take me wrong, there is is no such thing as an absolute rule which forces every method to accept more than one element. This should be applied where it makes sense only!

Taking your example into account:
public function addContact($contact, $sendWelcomeEmail = false)
might simply be:
public function addContacts($contacts, $sendWelcomeEmail = false)

Where $contacts can be one or more contacts and $sendWelcomeEmail would be a single boolean for the whole list of contacts.

If this boolean would be different for every contacts, then nothing prevents you from using it one by one:
foreach ($contacts as $contact) {
$user->addContacts($contact, /* bool */);
}

At least, you have the choice!

Type hinting is indeed not directly possible although you have a few alternatives:
* Define a ContactIterator which you will use as type hinting.
* Handling the type yourself.

I don't agree with you saying it is harder to read, check for yourself at: http://pallaert.pastebin.com/i9AXxqSr

Having two methods to achieve almost the same thing is the best way to confuse people, check http://chaos.troll.no/~shausman/api-design/api-design.pdf for the characteristics of Good APIs.

Of course there is nothing wrong doing a loop! But I assure you that API's supporting bulk operation when it makes sense are sometimes much easier to read/use, performs better or even both.

Similarly, which one of the following set of shell commands do you consider more readable?
$ for file in *.txt ; do rm $file ; done
or:
$ rm *.txt

I will insist again, apply this where it make sense!

Chris Henry said...

Patrick, I actually found your array casting solution solution very elegant. In my case, handling whatever the user decided to pass in was very important, because our API had to be easy to use. I would up jumping through a number of hoops and did a lot of type checking to make sure we achieved our goal.

Great little piece of advice!