Testing URL format – URL should contain only allowed characters


Sometimes, issues occur due to incorrect URL formatting.

For example:

In a website page, URL is dynamically created according to the user input.

User uploads photos for a Car Model, and the URL for the photo location is created according to the car name.

URL Format being – http://www.domain.com/makename-modelname-uniqueid/

If a User uploads a photo for a Honda Jazz car, the image will be saved with a URL  http://www.domain.com/honda-jazz-1093/

Testing in these cases should involve checking the URL that is dynamically created. You need to verify that the URL does not contain invalid characters in any possible scenario.

Some passed and failed test scenarios for above example would be as listed below:
For Car = Maruti Suzuki Wagon R 1.0

1. http://www.domain.com/maruti suzuki-wagon r 1.0-2455/

=> Make, and model names contain space character, and hence, URL contains invalid space characters; Model name contains a dot (.) character, and hence does the URL; URL is invalid because it contains invalid characters

2. http://www.domain.com/marutisuzuki-wagonr10-2455/

=> Space and dot (.) character have been removed while forming the URL; URL contains only valid characters, and is hence a valid URL

For Car = Ford Fiesta (2006-2011)

1. http://www.domain.com/ford-fiesta (2006-2011)-2456/

=> Model name contains space and bracket character, and hence, URL is invalid because it contains invalid characters

2. http://www.domain.com/ford-fiesta20062011-2456/

=> Space and Bracket characters have been removed while forming the URL; URL contains only valid characters, and is hence a valid URL

Following are the allowed URL characters -

Unreserved

These characters can be used unencoded anywhere in the URL

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 - _ ~

Reserved

These characters are used as delimiters, and should not be used for any other purpose as unencoded.

Reserved characters will only be used in URL formatting. Should not be contained in the dynamic values passed.

: / ? # [ ] @ ! $ & ' ( ) * + ; , = .

From among the above mentioned characters, only the unreserved characters (a-z , A-Z , 0-9 , – ,  _ , and ~) can be used for the actual name parts of the URL. Any other character needs to be Percent-encoded

(Read more about reserved/unreserved characters here…)

And this is why

1. Unencoded and invalid characters (like space, bracket or å) in a URL don’t work in all user agents. Newer versions of browsers seem to handle them fine but older browsers may not be able to follow links or load images.

2. It may make URLs ugly and hard to read since browsers may percent encode some of these characters before displaying them in the location bar. This varies from browser to browser. A URL like http://example.com/å ä ö/ may be displayed as http://example.com/å ä ö/http://example.com/å%20ä%20ö/http://example.com/%C3%A5%20%C3%A4%20%C3%B6/, or even http://example.com/√•%20√§%20√∂/.

Always check that the URLs generated/specified in your website contain valid characters as specified above.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s