URL Parser Class


Topics:

Overview
Enumerations
Data Structures
Functions


Overview

The gxsURL class is used to parse uniform resource locators. URL information is extracted in the following format:

protocol://username:password@hostname:port/path/filename


Enumerations

// This following list of URL protocols is a combination
// standard and non-standard URI schemes taken from:
// http://www.w3.org/pub/WWW/Addressing/schemes.html
enum { // Recognized URL protocols
  gxsURL::gxs_Unknown_URL_protocol = 0, // Protocol is not known
  gxsURL::gxs_about,       // Client-Side JavaScript Reference
  gxsURL::gxs_acap,        // ACAP -- Application Configuration Access
  gxsURL::gxs_afp,         // URLs for use with Service Location
  gxsURL::gxs_afs,         // Reserved, per Internet Standard 
  gxsURL::gxs_callto,      // NetMeeting Hyperlink on a Web Page
  gxsURL::gxs_chttp,       // RealPlayer Caching Protocol 
  gxsURL::gxs_cid,         // Content-ID and Message-ID  
  gxsURL::gxs_clsid,       // Identifies OLE/COM classes 
  gxsURL::gxs_data,        // Data: URL scheme
  gxsURL::gxs_file,        // Host-specific file names URL RFC  
  gxsURL::gxs_finger,      // Finger protocol URL
  gxsURL::gxs_ftp,         // File Transfer protocol URL
  gxsURL::gxs_gopher,      // Gopher protocol URL
  gxsURL::gxs_hdl,         // CNRI handle system 
  gxsURL::gxs_http,        // Hypertext Transfer Protocol URL 
  gxsURL::gxs_https,       // HTTP over SSL (Secure Socket Layer)
  gxsURL::gxs_iioploc,     // Interoperable Naming Joint Revised Sub
  gxsURL::gxs_ilu,         // ILU types, string binding handles 
  gxsURL::gxs_imap,        // IMAP URL scheme 
  gxsURL::gxs_ior,         // CORBA interoperable object reference 
  gxsURL::gxs_java,        // Identifies Java classes 
  gxsURL::gxs_javascript,  // Client-Side JavaScript Reference
  gxsURL::gxs_jdbc,        // Used in Java SQL API 
  gxsURL::gxs_ldap,        // An LDAP URL Format
  gxsURL::gxs_lifn,        // BFD -- Bulk File distribution
  gxsURL::gxs_mailto,      // Electronic mail address
  gxsURL::gxs_mid,         // Content-ID and Message-ID 
  gxsURL::gxs_news,        // USENET news
  gxsURL::gxs_nfs,         // NFS URL Scheme
  gxsURL::gxs_nntp,        // USENET news using NNTP access URL
  gxsURL::gxs_path,        // Path spec 
  gxsURL::gxs_pop,         // POP URL Scheme
  gxsURL::gxs_pop3,        // A POP3 URL Interface
  gxsURL::gxs_printer,     // Definition of printer
  gxsURL::gxs_prospero,    // Prospero Directory Service URL
  gxsURL::gxs_res,         // Res Protocol
  gxsURL::gxs_rtsp,        // Real Time Streaming Protocol (RTSP)
  gxsURL::gxs_rvp,         // Rendezvous Protocol
  gxsURL::gxs_rlogin,      // Remote login
  gxsURL::gxs_rwhois,      // The RWhois Uniform Resource Locator
  gxsURL::gxs_rx,          // Remote Execution
  gxsURL::gxs_sdp,         // SDP URL Scheme
  gxsURL::gxs_service,     // Service Templates and service
  gxsURL::gxs_sip,         // SIP URL Scheme
  gxsURL::gxs_shttp,       // Secure http
  gxsURL::gxs_snews,       // NNTP over SSL
  gxsURL::gxs_stanf,       // Stable Network Filenames 
  gxsURL::gxs_telnet,      // Reference to interactive sessions URL RFC 
  gxsURL::gxs_tip,         // Transaction Internet Protocol Version 3.0
  gxsURL::gxs_tn3270,      // Reserved, per Internet Standard 
  gxsURL::gxs_tv,          // Television Broadcasts
  gxsURL::gxs_uuid,        // The UUID addressing scheme
  gxsURL::gxs_wais,        // Wide Area Information Servers URL 
  gxsURL::gxs_whois,       // Distributed directory service
  gxsURL::gxs_whodp        // WhoDP: Widely Hosted Object Data Protocol
};


Data Structures

// Structure containing info on a URL. 
struct gxsURLInfo
{
  // URL information
  gxString url;	            // Unchanged URL 
  gxString proto;	    // URL protocol 
  gxString host;            // Extracted hostname 
  gxString path, dir, file; // Path, as well as directory and file 
  gxString user, passwd;    // Username and password 
  gxsURLInfo *proxy;        // The exact string to pass to proxy server 
  gxString local;           // The local filename of the URL document
  gxString referer;	    // Source that requested URI was obtained
  int port;                 // Port number
  int proto_type;           // Enumerated value representing a protocol  
  char ftp_type;            // FTP type

  // Members used by HTTP clients
  gxString parent_directory; // This resource's parent directory
  gxString local_file;       // This resource's parent directory
};


Functions

gxsURL::gxsURL()
gxsURL::~gxsURL()
gxsURL::CleanUserName()
gxsURL::GetPortNumber()
gxsURL::GetProtocolString()
gxsURL::GetProtocolType()
gxsURL::HasFile()
gxsURL::HasProtocol()
gxsURL::ParseDirectory()
gxsURL::ParseHostName()
gxsURL::ParsePortNumber()
gxsURL::ParseProtocol()
gxsURL::ParseURL()
gxsURL::ParseUserName()
gxsURL::ProcessFTPType()

gxsURL::gxsURL() - Default class constructor.

gxsURL::~gxsURL() - Class destructor.

int gxsURL::CleanUserName(const gxString &url, gxString &clean_url) - Public member function used to remove the username and password string from a URL and pass back a clean URL in the "clean_url" variable. Returns false if the URL does not contain a username or password.

int gxsURL::GetPortNumber(const gxString &url, int &port) - Public member function used to obtain a port number according to the protocol specified in the URL. If the port number cannot be determined or is not known this function will return false and set the port number to port 80.

char *gxsURL::GetProtocolString(int protocol) - Public member function that returns a null terminated string corresponding to the specified protocol. The "protocol" variable must equal one of the integer constants defined in the URL protocol enumeration.

int gxsURL::GetProtocolType(const gxString &protocol) - Public member function used to identify the specified protocol string and tag it with one of the integer constant defined in the URL protocol enumeration.

int gxsURL::HasFile(const gxString &path, gxString &dir, gxString &file) - Public member function that returns true if the path has a file associated with it. The directory and file name will be passed back in the "dir" and "file" variables.

int gxsURL::HasFile(const gxString &path) - Public member function that returns true if the path has a file associated with it.

int gxsURL::HasProtocol(const gxString &url) - Public member function that returns a protocol type defined in the URL protocol enumeration if the URL begins with a protocol.

int gxsURL::ParseDirectory(gxsURLInfo &u) - Public member function used to build the directory and filename components of the path specified in the gxsURLInfo object. Returns true if the path has a file associated with it.

int gxsURL::ParseDirectory(const gxString &url, gxString &path, gxString &dir,gxString &file) - Public member function Public member function used to build the directory and filename components of the path specified in the gxsURLInfo object. Returns true if the path has a file associated with it. The path, directory, and file name will be passed back in the "path", "dir", and "file" variables.

int gxsURL::ParseHostName(const gxString &url, gxString &host,int remove_port_number = 1) - Public member function used to parse the hostname from a URL. If the "remove_port_number variable is true the port number will be removed from the host name if a port number was specified. Returns true if a valid hostname was found in the string containing the URL.

int gxsURL::ParsePortNumber(const gxString &url, int &port) - Public member function used to parse a port number from a URL and pass back the value in the "port" variable. Returns true if a port number was found.

int gxsURL::ParseProtocol(const gxString &url, gxString &proto_name, int &proto_type) - Public member function used to parse the protocol string in the specified URL. Passes back the protocol name in the "proto_name" variable and a protocol type defined in the URL protocol enumeration. Returns false if no protocol is found in the URL.

int gxsURL::ParseURL(const gxString &url, gxsURLInfo &u, int strict = 0) - Public member function used to extract the specified URL in the following format:

URL protocol://username:password@hostname:port/path/filename

Extracts the hostname terminated with a forward slash or colon. Extracts the port number terminated with forward slash, or selects the protocol if no port number is specified. The directory name equals everything after the hostname. The URL information will be passed back in the "u" variable. Returns false if any errors occur during the parsing operation. If the "strict" variable is true this function will return false if an unknown protocol is specified in the URL.

int gxsURL::ParseURL(const char *url, gxsURLInfo &u, int strict = 0) - Public member function used to parse the specified URL.

int gxsURL::ParseUserName(const gxString &url, gxString &user, gxString &passwd,gxString &clean_url) -Public member function used to find the optional username and password within the URL, as per RFC1738. Returns false if the URL does not contain a username or password. Passes back a URL without the username and password information in the "clean_url" variable.

void gxsURL::ProcessFTPType(const gxString &url, char &ftp_type) - Public member function used to determine the FTP type.


End Of Document