Web Scrape

I am trying to collect information from a student directory. Unfortunately, I cannot provide a link, as it requires a login. But I will try to describe it

The page consists of a table, with each of the 63 rows beginning with hyperlinked text: "View Details." Each link does not open a new webpage, but the same page with the details in a box in the middle of a blurred out table.

The link for the first item is: javascript:__doPostBack('ctl00$Main$gvSearchResults$ctl02$lbDetail','')

The link for the last item is:
javascript:__doPostBack('ctl00$Main$gvSearchResults$ctl64$lbDetail','').

What I would like to do is:

  1. Click the first link (help!)
  2. Copy the displayed details (I think I can do this)
  3. Click close (I can do this)
  4. Paste the copied information into a document. (I can do this)
  5. Click the next link (help!)
  6. Repeat (help!)

Then we will need the HTML code around the area that you want to access.
Please post in a Forum Code Block, or upload the HTML text file in a zip file.

Thank you so much for your reply!

Each row has the following:

<tr style="background-color:Beige;">
					<td>          
          <a id="Main_gvSearchResults_lbDetail_0" href="javascript:__doPostBack(&#39;ctl00$Main$gvSearchResults$ctl02$lbDetail&#39;,&#39;&#39;)" style="color:DarkBlue;">View Detail</a>
        </td><td>Michael</td><td>Scott</td><td>GRAD  2009  </td><td>
          
          <a id="Main_gvSearchResults_lbPrefEmail_0" href="javascript:__doPostBack(&#39;ctl00$Main$gvSearchResults$ctl02$lbPrefEmail&#39;,&#39;&#39;)" style="color:DarkBlue;">Click to View</a>                 
          <a id="Main_gvSearchResults_hplClickPrefEmail_0" title="FAQ - Click to View" href="javascript:openWindow(&#39;faq.aspx?type=search#searchClickToView&#39;,&#39;popup1&#39;,&#39;toolbar=no,location=no,directories=no,status=no,menubar=no,scrollbars=yes,resizable=no,width=510,height=400&#39;)"><img title="FAQ - Click to View" src="images/questionMark_sm.gif" alt="" /></a>
        </td><td>
          <a id="Main_gvSearchResults_hplAltEmail_0" style="color:DarkBlue;"></a>
                           
          
        </td><td>
          <a id="Main_gvSearchResults_lbLocalAddrRestricted_0" href="javascript:__doPostBack(&#39;ctl00$Main$gvSearchResults$ctl02$lbLocalAddrRestricted&#39;,&#39;&#39;)" style="color:DarkBlue;">Click to View</a>                     
          <a id="Main_gvSearchResults_hplClickLocalAddr_0" title="FAQ - Click to View" href="javascript:openWindow(&#39;faq.aspx?type=search#searchClickToView&#39;,&#39;popup1&#39;,&#39;toolbar=no,location=no,directories=no,status=no,menubar=no,scrollbars=yes,resizable=no,width=510,height=400&#39;)"><img title="FAQ - Click to View" src="images/questionMark_sm.gif" alt="" /></a>
          
        </td><td>
          <a id="Main_gvSearchResults_lbLocalPhoneRestricted_0" href="javascript:__doPostBack(&#39;ctl00$Main$gvSearchResults$ctl02$lbLocalPhoneRestricted&#39;,&#39;&#39;)" style="color:DarkBlue;">Click to View</a>                             
          <a id="Main_gvSearchResults_hplClickLocalPhone_0" title="FAQ - Click to View" href="javascript:openWindow(&#39;faq.aspx?type=search#searchClickToView&#39;,&#39;popup1&#39;,&#39;toolbar=no,location=no,directories=no,status=no,menubar=no,scrollbars=yes,resizable=no,width=510,height=400&#39;)"><img title="FAQ - Click to View" src="images/questionMark_sm.gif" alt="" /></a>          
          
        </td><td>President</td><td>Dunder Miflin</td><td>
          <a id="Main_gvSearchResults_lbBusnAddrRestricted_0" href="javascript:__doPostBack(&#39;ctl00$Main$gvSearchResults$ctl02$lbBusnAddrRestricted&#39;,&#39;&#39;)" style="color:DarkBlue;">Click to View</a>                             
          <a id="Main_gvSearchResults_hplClickBusnAddr_0" title="FAQ - Click to View" href="javascript:openWindow(&#39;faq.aspx?type=search#searchClickToView&#39;,&#39;popup1&#39;,&#39;toolbar=no,location=no,directories=no,status=no,menubar=no,scrollbars=yes,resizable=no,width=510,height=400&#39;)"><img title="FAQ - Click to View" src="images/questionMark_sm.gif" alt="" /></a>          
          
        </td><td>
          <a id="Main_gvSearchResults_lbBusnPhoneRestricted_0" href="javascript:__doPostBack(&#39;ctl00$Main$gvSearchResults$ctl02$lbBusnPhoneRestricted&#39;,&#39;&#39;)" style="color:DarkBlue;">Click to View</a>                             
          <a id="Main_gvSearchResults_hplClickBusnPhone_0" title="FAQ - Click to View" href="javascript:openWindow(&#39;faq.aspx?type=search#searchClickToView&#39;,&#39;popup1&#39;,&#39;toolbar=no,location=no,directories=no,status=no,menubar=no,scrollbars=yes,resizable=no,width=510,height=400&#39;)"><img title="FAQ - Click to View" src="images/questionMark_sm.gif" alt="" /></a>          
          
        </td>
				</tr>

If you click on "View Detail" you get the following, which is the info I want:

<tbody><tr>
				<td><span id="Main_lblNameTitle" style="color:DarkBlue;">Name</span></td><td><span id="Main_lblName">Michael Scott</span></td>
			</tr><tr id="Main_trGradYear">
				<td><span id="Main_lblGradYearTitle" style="color:DarkBlue;">Grad Year</span></td><td><span id="Main_lblGradYear">2009  </span></td>
			</tr><tr id="Main_trDegreeProgram">
				<td><span id="Main_lblDegreeProgramTitle" style="color:DarkBlue;">Degree/Program</span></td><td><span id="Main_lblDegreeProgram">MBA</span></td>
			</tr><tr id="Main_trCohort">
				<td><span id="Main_lblCohortTitle" style="color:DarkBlue;">Cohort</span></td><td><span id="Main_lblCohort">0 </span></td>
			</tr><tr id="Main_trPrefEmail">
				<td><span id="Main_lblPrefEmailTitle" style="color:DarkBlue;">Preferred Email</span></td><td><a id="Main_hplPrefEmail" href="mailto:michael@dundermifflin.com" style="color:DarkBlue;">rich@archerim.com</a></td>
			</tr><tr id="Main_trBestEmail">
				<td><span id="Main_lblBestEmailTitle" style="color:DarkBlue;">Best Email</span></td><td><a id="Main_hplBestEmail" href="mailto:michael@dundermifflin.com" style="color:DarkBlue;">rich@archerim.com</a></td>
			</tr><tr id="Main_trHomepage">
				<td><span id="Main_lblHomepageTitle" style="color:DarkBlue;">Homepage</span></td><td><a id="Main_hplHomepage" href="http://www.dundermifflin.com" target="_blank" style="color:DarkBlue;">http://www.dundermifflin.com</a></td>
			</tr><tr id="Main_trHomeInfo">
				<td valign="top"><span id="Main_lblHomeInfoTitle" style="color:DarkBlue;">Home Information</span></td><td><div id="Main_pnlHomeInfo">
					
                   <span id="Main_lblHomeInfo">725 Slough Avenue                     <br>Suite 1540                              <br>Scranton, PA       </span><br>                  
                
				</div><div id="Main_pnlHomePhone">
					                    
                   <span id="Main_lblHomePhone">5079896428</span>  
                
				</div></td>
			</tr><tr id="Main_trEmployer">
				<td><span id="Main_lblEmployerTitle" style="color:DarkBlue;">Employer </span></td><td><span id="Main_lblEmployer">Dunder Miflin</span></td>
			</tr><tr id="Main_trJobTitle">
				<td><span id="Main_lblJobTitleDesc" style="color:DarkBlue;">Job Title </span></td><td><span id="Main_lblJobTitle">President</span></td>
			</tr><tr id="Main_trIndustry">
				<td><span id="Main_lblIndustryTitle" style="color:DarkBlue;">Industry </span></td><td><span id="Main_lblIndustry">Paper</span></td>
			</tr><tr id="Main_trBusnInfo">
				<td valign="top"><span id="Main_lblBusnInfoTitle" style="color:DarkBlue;">Business Information </span></td><td><div id="Main_pnlBusnInfo">
					
                   <span id="Main_lblBusnInfo">725 Slough Avenue                   <br>Suite 1540                              <br>Scranton, PA       </span><br>                  
                
				</div><div id="Main_pnlBusnPhone">
					                    
                   <span id="Main_lblBusnPhone">55079896428</span>  
                
				</div></td>
			</tr><tr id="Main_trCountryOfOrigin">
				<td><span id="Main_lblCountryOfOriginTitle" style="color:DarkBlue;">Country of Origin </span></td><td><span id="Main_lblCountryOfOrigin">United States</span></td>
			</tr>
		</tbody>

Since the info I want is always in the same spot on the screen, I was just going to record that part copying that part.

Then I would click "close," and the "View Details" in the next row, repeating until the last row.
Since the info I want is always in the same spot on the screen, I was just going to record that part.

Thanks for the HTML.
However, we need HTML code above the start of the table -- usually two or three <div> elements above so that the proper JavaScript querySelector() can be designed.

Do you mean "Click to View"?
I don't see a "View Detail".

It's right in the beginning:

<a id="Main_gvSearchResults_lbDetail_0" href="javascript:__doPostBack(&#39;ctl00$Main$gvSearchResults$ctl02$lbDetail&#39;,&#39;&#39;)" style="color:DarkBlue;">View Detail</a>
        </td><td>Michael</td><td>Scott</td><td>GRAD  2009  </td><td>

Actually, it looks like "View Detail" and "Click to View" open the same popup.

Does this help?

<script type="text/javascript">
    $(document).ready(function() {
    $(".widget-content dt a[href^='http://']").attr({target:"_blank",title:"Opens in a new window"}).append(" <span class='external'>External link</span>");
	$(".widget-content dt a[href^='https://']").attr({target:"_blank",title:"Opens in a new window"}).append(" <span class='external'>External link</span>");
	$(".widget-content dt a[href$='.pdf']").attr({title:"Download PDF"}).append(" <span class='pdf'>PDF</span>");
    $(".widget-content dt a[href$='.doc'],.widget-content dt a[href$='.docx']").attr({title:"Download Word file"}).append(" <span class='word'>Word</span>");
    $(".widget-content dt a[href$='.xls'],.widget-content dt a[href$='.xlsx']").attr({title:"Download Excel file"}).append(" <span class='excel'>Excel</span>");
    $(".widget-content dt a[href$='.ppt'],.widget-content dt a[href$='.pptx']").attr({title:"Download Powerpoint file"}).append(" <span class='ppt'>Powerpoint</span>");
    $(".widget-content dt a[href$='#']").attr({title:"Need link!"}).append("<br />&nbsp;&nbsp;&nbsp;<span style='color:red'>Need link!</span>");
    $(".widget-content dd:contains('AddContentHere')").prev("dt").append("<br />&nbsp;&nbsp;&nbsp;<span style='color:darkorange'>Need description!</span>");
});
</script>

            <div id="topbar">
                <div id="topbar-content">
                    <h1><a id="Banner_hplH1" href="http://www.dundermifflin.com/“></a></h1>
                    <h2><a id="Banner_hplH2" href="../MOR/">Welcome to DF Online Resources</a></h2>
 
                    <div id="user-info">
                        <p class="greeting">
                            Welcome
                            <span id="Banner_lblUserName" style="color:#404040;font-weight:bold;">Mickey Mouse</span>
                            &nbsp;&nbsp;&nbsp;&nbsp;<span class="emulate">
                            </span>
                            <br>to DF Online Resources&nbsp;&nbsp;&nbsp;&nbsp;<a id="Banner_hplRegister" class="register" href="/MOR/public/LogOff.aspx">Log Off</a>
                            &nbsp;&nbsp;&nbsp;&nbsp;<a id="Banner_hplEmulate"></a>
                        </p>
                    </div>
                </div>
            </div>
            <input type="hidden" name="ctl00$Banner$hfTab" id="Banner_hfTab">
     
        
              
        
	<div id="wrapper">
	    <div id="content">     
            <div id="tabs">	  
              	    
            	            

        		        <ul id="tabs-nav">
					        <li class="ui-tabs-selected">
					        <a id="navTabs_hplHome" href="../MOR/"><span>Home</span></a></li>

					        
					        	
					        
					        <li>
					        <a id="navTabs_hplCareerServices" href="Default_CareerServices.aspx"><span>Career Management</span></a></li>
					        
					        	
					        	
					        
					        <li>
					        <a id="navTabs_hplEmployee" href="Default_Staff.aspx"><span>Employee Applications</span></a></li>
					        					        		
					        
					        <li id="SearchTab"><a href="/MOR/search.aspx"><span>Search Directory</span></a></li>
					        	 
					        							        
					        													
				        </ul>

            
	       
            





 <div id="pageTitle">
  <span id="Main_lblTitle">&nbsp; &nbsp; &nbsp;Search Results</span>
 </div>

  <table border="0" cellspacing="0" cellpadding="0" style="width:100%">
     <tbody><tr><td border="0" class="form_tab_start_on"></td>
   <td class="tbl_haut">
      <table id="Main_Tab_tblTabs" cellspacing="0" cellpadding="0" style="border-width:0px;border-collapse:collapse;">
	<tbody><tr id="Main_Tab_trTabs">
		<td class="form_tab_on"><a class="A_Tab" href="/MOR/search.aspx">Search DF Directory</a></td><td class="form_tab_end_on"></td>
	</tr>
</tbody></table>
   </td>
   <td class="tbl_haut_d">
   </td>


</tr>
     <tr>
        <td class="tbl_g"></td>
        <td class="form_bgcolor" style="padding: 10px 10px 10px 10px;">   

 
 
 
 <!--<input type=button value="Close Window" onClick="self.close()"> -->
 <br>

 

        <div id="Main_updatePanel">
	     

 <div>
		<table cellspacing="1" id="Main_gvSearchResults" style="font-size:X-Small;">
			<thead>
				<tr>
					<th scope="col">&nbsp;</th><th scope="col" style="font-weight:bold;">FirstName</th><th scope="col" style="font-weight:bold;">LastName</th><th scope="col" style="font-weight:bold;">GradYear</th><th scope="col" style="font-weight:bold;">PrefEmail</th><th scope="col" style="font-weight:bold;">AltEmail</th><th scope="col" style="font-weight:bold;">LocalAddr</th><th scope="col" style="font-weight:bold;">LocalPhone</th><th scope="col" style="font-weight:bold;">JobTitle</th><th scope="col" style="font-weight:bold;">EmpName</th><th scope="col" style="font-weight:bold;">BusnAddr</th><th scope="col" style="font-weight:bold;">BusnPhone</th>
				</tr>
			</thead>

Yes. I now have a semi-working web page.

Does this look basically like the real web page:

But I need you to post a screenshot of the popup that is displayed when you click on "View Detail", and indicate which fields you want to copy.

Also, please confirm that the "View Detail" link looks like this for each row, except that the number at the end of the ID changes:

<a id="Main_gvSearchResults_lbDetail_==0=="

So the second link would be:

<a id="Main_gvSearchResults_lbDetail_==1=="

You are so incredibly helpful. Thank you so much.
The link for the first item is: javascript:__doPostBack('ctl00$Main$gvSearchResults$ctl02$lbDetail','')

the second item is:
javascript:__doPostBack('ctl00$Main$gvSearchResults$ctl03$lbDetail','')

I've included a screenshot below. I deleted the data. Right now I just manually copy everything in the box.

Please see my reply in a PM.

I think it would be more efficient and faster to first get all of the data, and then paste it to your target document.

I think I can get all of the detailed data for all persons, and return in a tab-delimited list, one line per person.
This should make it very flexible and easy to paste into a variety of documents.

BUT, I need the entire HTML source code for the page, after you have done this:

  1. Click on "View Details" for first person.
  2. Close Details popup
  3. Click on "View Details" for second person.
  4. While that popup is still open, save the page HTML code.
    • in Chrome, it is File > Save page as... to have as a HTML file.
  5. Zip this HTML file and upload here. You can upload to a PM if you prefer.

I need the entire HTML file to properly develop and test.

1 Like