Check broken links and handle multiple windows through Selenium webdriver in Python


webdriver - handle multiple windows

Today I am going to post the code of a function I wrote to check broken links in a web page.

You can learn two things by going through it –

First, how to check broken links (I know that’s obvious :D).
Second, how to handle multiple windows in selenium webdriver + python. There are three useful functions for this –

1. webdriver.window_handles 
 => returns the handles of all windows within the 
    current webdriver session
2. webdriver.current_window_handle
 => returns the handle of the current window 
     (the window currently in focus)
3. webdriver.switch_to_window(window_name) 
 => switches focus to the window having specified 
    window_name or window_handle 
    (we can pass the window_handle instead of window_name
        as a parameter to this function)

To check broken links, my code navigates to the link given, and checks whether user lands up in the same page as expected.
While checking links on a page, I also find some links that, when clicked, open a page in a separate window. Most common example of such links are the links to social media sites on a page.

Eg. In the below code, I am testing the home page of “http://www.carwale.com/“.
This page has 4 links – the icons at the bottom of the page for facebook, youtube, google plus, and twitter – that when clicked, open the respective pages in a new window. Here, the section that handles multiple windows will come into play.

from selenium import webdriver
browser = webdriver.Firefox()
home_page = "http://www.carwale.com/"

def check_page_broken_links(self, url):
# Sample usage:
#     check_page_broken_links(self,"http://www.carwale.com/") 
#         will return empty list if all links in the page work fine
#         else it will return list of all the broken links 
#                                    (either link text, or link href)
#   Will check for -  i) "Page Not found" error
#                     ii) Redirects

    try:
        failed = []
        self.implicitly_wait(5)
        self.get(url)
        number_of_links = len(self.find_elements_by_tag_name('a'))

        for i in range(number_of_links):
            # Save current browser window handle
            initial_window = self.current_window_handle 
            ## print "initial_window_handle:      ", initial_window

            link = self.find_elements_by_tag_name('a')[i]
            link_address = link.get_attribute("href")
            link_name = link.text
            print "link checked: ",i,": ",link_name,": ",link_address

            if ((link_address is not None) 
                and ("google" not in link_address) 
                and ("mailto" not in link_address) 
                and is_link_element_displayed(self,element=link) is True):
                  link.click() # link clicked
                  open_windows = self.window_handles
                  ## print "window_handles:      ", open_windows

                  # Navigate to the browser window where 
                  #               latest page was opened
                  self.switch_to_window(open_windows[-1])
                  ## print "current_window_handle:"
                  ## print self.current_window_handle
                  time.sleep(5)
                  print "defined: ",link_address
                  print "current: ", self.current_url

                  if (link_address[-1] == "#" 
                       and self.current_url in 
                        [link_address, link_address[:-1],
                                      link_address[:-2],link_address+'/']):
                          # A "#" at the end means user 
                          # will stay on the same page.(Valid scenario) 
                          pass  
                  elif (self.current_url not in 
                        [link_address,
                         home_page + link_address[1:]]): 
                        # if user lands up in a page different 
                        #                    from that intended 
                        if link_name:
                          failed.append(link_name)
                        else:
                          failed.append(link_address)

                  if len(self.window_handles) > 1:  
                          # close newly opened window
                          self.close()

                  # Switch to main browser window
                  self.switch_to_window(open_windows[0]) 

            self.get(url)
    except Exception, e: 
           return ['Exception occurred while checking',e]
    return failed

# call defined function to check broken links in carwale.com home page
print check_page_broken_links(browser,"http://www.carwale.com/")

If there are any broken links in the URL you passed to the function, they will be printed out in a list at the end of program execution.

Testing URL format – URL should contain only allowed characters


Sometimes, issues occur due to incorrect URL formatting.

For example:

In a website page, URL is dynamically created according to the user input.

User uploads photos for a Car Model, and the URL for the photo location is created according to the car name.

URL Format being – http://www.domain.com/makename-modelname-uniqueid/

If a User uploads a photo for a Honda Jazz car, the image will be saved with a URL  http://www.domain.com/honda-jazz-1093/

Testing in these cases should involve checking the URL that is dynamically created. You need to verify that the URL does not contain invalid characters in any possible scenario.

Some passed and failed test scenarios for above example would be as listed below:
For Car = Maruti Suzuki Wagon R 1.0

1. http://www.domain.com/maruti suzuki-wagon r 1.0-2455/

=> Make, and model names contain space character, and hence, URL contains invalid space characters; Model name contains a dot (.) character, and hence does the URL; URL is invalid because it contains invalid characters

2. http://www.domain.com/marutisuzuki-wagonr10-2455/

=> Space and dot (.) character have been removed while forming the URL; URL contains only valid characters, and is hence a valid URL

For Car = Ford Fiesta (2006-2011)

1. http://www.domain.com/ford-fiesta (2006-2011)-2456/

=> Model name contains space and bracket character, and hence, URL is invalid because it contains invalid characters

2. http://www.domain.com/ford-fiesta20062011-2456/

=> Space and Bracket characters have been removed while forming the URL; URL contains only valid characters, and is hence a valid URL

Following are the allowed URL characters

Unreserved

These characters can be used unencoded anywhere in the URL

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 - _ ~

Reserved

These characters are used as delimiters, and should not be used for any other purpose as unencoded.

Reserved characters will only be used in URL formatting. Should not be contained in the dynamic values passed.

: / ? # [ ] @ ! $ & ' ( ) * + ; , = .

From among the above mentioned characters, only the unreserved characters (a-z , A-Z , 0-9 , – ,  _ , and ~) can be used for the actual name parts of the URL. Any other character needs to be Percent-encoded

(Read more about reserved/unreserved characters here…)

And this is why

1. Unencoded and invalid characters (like space, bracket or å) in a URL don’t work in all user agents. Newer versions of browsers seem to handle them fine but older browsers may not be able to follow links or load images.

2. It may make URLs ugly and hard to read since browsers may percent encode some of these characters before displaying them in the location bar. This varies from browser to browser. A URL like http://example.com/å ä ö/ may be displayed as http://example.com/å ä ö/http://example.com/å%20ä%20ö/http://example.com/%C3%A5%20%C3%A4%20%C3%B6/, or even http://example.com/√•%20√§%20√∂/.

Always check that the URLs generated/specified in your website contain valid characters as specified above.

Find links to your website through Google


Thanks to a post in another blog (http://blog.nerdstogeeks.com/2009/06/track-links-to-your-site-from-google.html), I learnt a new thing yesterday.

You can find out all the references to your website in the world wide web by this simple trick.

Just go to Google search, type in “link: ‘Your website URL’“, and press Enter.

Google will list out all the links to your website URL from where people can visit your page. Nice, huh?