========================= fantomas shadowMaker(TM) ========================= ver. 2.02.01 Date of release: 2006-02-12 ************************************ * ONLINE HELP IS AVAILABLE AFTER * * SUCCESSFUL INSTALLATION BY * * CLICKING ON THE FANTOMAS LOGO! * ************************************ SYSTEM REQUIREMENTS INSTALLATION - UNIX UNINSTALLING THE PROGRAM WORKING WITH shadowMaker(TM) ERROR HANDLING KNOWN ISSUES UPDATES + PROGRAM CHRONOLOGY CONTACT + SUPPORT ====================================================================== SYSTEM REQUIREMENTS ------------------- Language ------- Perl 5 UNIX ---- The Unix system requires an installed web server. Execution of CGI scripts and Server Side Includes (SSI) must be enabled. A directory for execution of CGI scripts must be existent. Usually, this will be directoy /cgi-bin/. Module mod_rewrite and .htaccess functionality must be given. Depending on the number of keywords/search phrases the total time of the spidering processes could be several hours. So execution of long Unix processes must be allowed. Tested under: Red Hat Linux with Apache Browser ------- Scripts are called and executed via web browser. You will currently achieve best results under MS Internet Explorer 5+. Netscape 4.7 may require adjustment of font size. Tested under: IE6+, Netscape 4.7, Netscape 7.0, Opera 6.0 ====================================================================== INSTALLATION ------------ The following files are included: shadowmaker.cgi --- (main menu script) buildcsv.cgi --- (program script) geturls.cgi --- (program script) getcontent.cgi --- (program script) createsite.cgi --- (program script) sdcontrol.cgi --- (program script) submitpages.cgi --- (program script) monitor.cgi --- (program script) SMSetup.pm --- (Setup file) SMLIB.pm --- (Script routines) PARSER.pm --- (Parsing routines) locs_en.txt --- (Language file) trademarks_en.txt --- (Trademarks exclusion list) company_names_en.txt --- (Company names exclusion list) family_names_en.txt --- (Families names exclusion list) first_names_en.txt --- (First names exclusion list) miscellaneous_en.txt --- (Miscellaneous exclusion list) enginelist.txt --- (search engine file - expandable) fantomas.gif --- (logo/graphics file) fa_license-e.txt --- (License Agreement And Terms Of Usage - PLEASE READ!) help-batcheditor-6_en.html --- (online help file) help-exclusionlists-3a_en.html --- (online help file) help-exclusionlists-3b_en.html --- (online help file) help-exclusionlists-3c_en.html --- (online help file) help-exclusionlists-3d_en.html --- (online help file) help-filenames-1_en.html --- (online help file) help-filenames-2_en.html --- (online help file) help-filenames-2a_en.html --- (online help file) help-filenames-2b_en.html --- (online help file) help-filenames-3_en.html --- (online help file) help-filenames-4_en.html --- (online help file) help-filenames-5_en.html --- (online help file) help-filenames-6_en.html --- (online help file) help-functions-3_en.html --- (online help file) help-functions-4a_en.html --- (online help file) help-functions-4b_en.html --- (online help file) help-keywordlist_en.html --- (online help file) help-landingpage_en.html --- (online help file) help-main_en.html --- (online help file) help-monitor-tools-1_en.html --- (online help file) help-param_en.html --- (online help file) help-phantompages-4_en.html --- (online help file) help-phantompages-4b_en.html --- (online help file) help-phantompages-4c_en.html --- (online help file) help-phantompages-4d_en.html --- (online help file) help-sdconfigure-5a_en.html --- (online help file) help-sdconfigure-5b_en.html --- (online help file) help-selecturls-3_en.html --- (online help file) help-spyfetcher-6_en.html --- (online help file) help-status-functions-2_en.html --- (online help file) help-status-functions-5_en.html --- (online help file) help-status-functions-6_en.html --- (online help file) smhelp_en.txt --- (documentation in TXT format) <--- THIS FILE YOU ARE READING! sm-tutorial-1_en.html --- (tutorial file) sm-tutorial-2_en.html --- (tutorial file) sm-tutorial-3_en.html --- (tutorial file) sm-tutorial-4_en.html --- (tutorial file) sm-tutorial-5_en.html --- (tutorial file) sm-tutorial-6_en.html --- (tutorial file) sm-tutorial-index_en.html --- (tutorial file) sm-tutorial-tools_en.html --- (tutorial file) ----------------------------------- ADJUSTMENTS IN FILE "SMSetup.pm" (please edit in ASCII or plain text editor like Notepad etc.) ----------------------------------- UNIX ---- System Path ----------- * Please check your system's path to location of Perl. The default path in the script is "/usr/bin/perl". If you don't know this path, you can check it out under telnet by entering Unix command "whereis perl". The system path will then be displayed for you to copy if required (see below). If your system path to Perl is not "/usr/bin/perl", you will have to adjust the first line accordingly in the following scripts: - shadowmaker.cgi - buildcsv.cgi - geturls.cgi - getcontent.cgi - createsite.cgi - sdcontrol.cgi - submitpages.cgi - monitor.cgi Configuration of Script Parameters (Variables) ---------------------------------------------- The following adjustments are related only to the parameter file "SMSetup.pm". * Please adjust the following variables to your requirements: - "$local_lang" - "$cgi_url" - "$from_mail" - "$to_mail" - "$sendmail" - "$doc_dir" (default = /helpfiles/) - "$graphics_dir" (default = /graphics/) - "$abs_admin_dir" A comprehensive description of these variables can be found below in chapter: "WORKING WITH fantomas shadowMaker(TM)" The following adjustment is related only to the script file "submitpages.cgi". Please adjust the absolute path to your CGI-BIN. Example: push(@INC, '/usr/www/htdocs/yourdomain/cgi-bin/'); FTP Upload Mode --------------- * When uploading via FTP, make sure to transfer ALL files in ASCII mode (including ".pm" files!). EXCEPTION: the graphics file "fantomas.gif" which must be transferred in BINARY or AUTOMATIC mode. Uploading Files To Your Web Server ---------------------------------- * The following scripts and files must be copied into the Unix server's CGI directory: - shadowmaker.cgi - buildcsv.cgi - geturls.cgi - getcontent.cgi - createsite.cgi - sdcontrol.cgi - submitpages.cgi - monitor.cgi - SMSetup.pm - SMLIB.pm - PARSER.pm CGI Directory Permissions ------------------------- * The CGI directory must be endowed with the following permissions: "chmod 755" [drwxr-xr-x] Creating Subdirectories ----------------------- * Next, create the following directories with the permissions "chmod 777" [drwxrwxrwx] BELOW your CGI directory: admin/ admin/admin_parms/ admin/admin_urls/ admin/admin_submissions/ admin/admin_links/ admin/admin_locs/ admin/admin_logs/ [This admin_logs subdirectory will be used to store an error log file which will be created in case of script errors. This log file can be used either for debugging or you may send it for further analysis to: techsupport@fantomaster.com] tarballs/ tmp/ input/ input/input_keywords/ input/input_contents/ input/input_descriptions/ input/input_exclusions/ * Next, create the following directories with the permissions "chmod 755" [drwxr-xr-x] BELOW your MAIN directory: helpfiles/ graphics/ Uploading the Exclusion List Files ---------------------------------- * Now, copy the following exclusion list files into the directory "input/input_exclusions/": - trademarks_en.txt - company_names_en.txt - family_names_en.txt - first_names_en.txt - miscellaneous_en.txt Uploading the Engine List File ------------------------------ * Copy the following file into the directory "admin/admin_submissions/": - enginelist.txt Uploading the Language File --------------------------- * Copy the following file into the directory "admin/admin_locs/": - locs_en.txt Uploading the Online Help Files ------------------------------- * Finally, copy the following files into the directory "helpfiles/": - help-batcheditor-6_en.html - help-exclusionlists-3a_en.html - help-exclusionlists-3b_en.html - help-exclusionlists-3c_en.html - help-exclusionlists-3d_en.html - help-filenames-1_en.html - help-filenames-2_en.html - help-filenames-2a_en.html - help-filenames-2b_en.html - help-filenames-3_en.html - help-filenames-4_en.html - help-filenames-5_en.html - help-filenames-6_en.html - help-functions-3_en.html - help-functions-4a_en.html - help-functions-4b_en.html - help-keywordlist_en.html - help-landingpage_en.html - help-main_en.html - help-monitor-tools-1_en.html - help-param_en.html - help-phantompages-4_en.html - help-phantompages-4b_en.html - help-phantompages-4c_en.html - help-phantompages-4d_en.html - help-sdconfigure-5a_en.html - help-sdconfigure-5b_en.html - help-selecturls-3_en.html - help-spyfetcher-6_en.html - help-status-functions-2_en.html - help-status-functions-5_en.html - help-status-functions-6_en.html - smhelp_en.txt - sm-tutorial-1_en.html - sm-tutorial-2_en.html - sm-tutorial-3_en.html - sm-tutorial-4_en.html - sm-tutorial-5_en.html - sm-tutorial-6_en.html - sm-tutorial-index_en.html - sm-tutorial-tools_en.html Uploading the Logo/Graphics File ------------------------------- * Finally, copy the following file into the directory "graphics/": - fantomas.gif Assigning Proper File Permissions --------------------------------- * Assign the following required file permissions: shadowmaker.cgi: "chmod 755" [-rwxr-xr-x] buildcsv.cgi: "chmod 755" [-rwxr-xr-x] geturls.cgi: "chmod 755" [-rwxr-xr-x] getcontent.cgi: "chmod 755" [-rwxr-xr-x] createsite.cgi: "chmod 755" [-rwxr-xr-x] sdcontrol.cgi: "chmod 755" [-rwxr-xr-x] submitpages.cgi: "chmod 755" [-rwxr-xr-x] monitor.cgi: "chmod 755" [-rwxr-xr-x] SMSetup.pm: "chmod 444" [-r--r--r--] SMLIB.pm: "chmod 444" [-r--r--r--] PARSER.pm: "chmod 444" [-r--r--r--] locs_en.txt: "chmod 444" [-r--r--r--] trademarks_en.txt: "chmod 666" [-rw-rw-rw-] company_names_en.txt: "chmod 666" [-rw-rw-rw-] family_names_en.txt: "chmod 666" [-rw-rw-rw-] first_names_en.txt: "chmod 666" [-rw-rw-rw-] miscellaneous_en.txt: "chmod 666" [-rw-rw-rw-] enginelist.txt: "chmod 666" [-rw-rw-rw-] fantomas.gif: "chmod 444" [-r--r--r--] smhelp_en.txt: "chmod 444" [-r--r--r--] help-batcheditor-6_en.html "chmod 444" [-r--r--r--] help-exclusionlists-3a_en.html "chmod 444" [-r--r--r--] help-exclusionlists-3b_en.html "chmod 444" [-r--r--r--] help-exclusionlists-3c_en.html "chmod 444" [-r--r--r--] help-exclusionlists-3d_en.html "chmod 444" [-r--r--r--] help-filenames-1_en.html "chmod 444" [-r--r--r--] help-filenames-2_en.html "chmod 444" [-r--r--r--] help-filenames-2a_en.html "chmod 444" [-r--r--r--] help-filenames-2b_en.html "chmod 444" [-r--r--r--] help-filenames-3_en.html "chmod 444" [-r--r--r--] help-filenames-4_en.html "chmod 444" [-r--r--r--] help-filenames-5_en.html "chmod 444" [-r--r--r--] help-filenames-6_en.html "chmod 444" [-r--r--r--] help-functions-3_en.html "chmod 444" [-r--r--r--] help-functions-4a_en.html "chmod 444" [-r--r--r--] help-functions-4b_en.html "chmod 444" [-r--r--r--] help-keywordlist_en.html "chmod 444" [-r--r--r--] help-landingpage_en.html "chmod 444" [-r--r--r--] help-main_en.html "chmod 444" [-r--r--r--] help-monitor-tools-1_en.html "chmod 444" [-r--r--r--] help-param_en.html "chmod 444" [-r--r--r--] help-phantompages-4_en.html "chmod 444" [-r--r--r--] help-phantompages-4b_en.html "chmod 444" [-r--r--r--] help-phantompages-4c_en.html "chmod 444" [-r--r--r--] help-phantompages-4d_en.html "chmod 444" [-r--r--r--] help-sdconfigure-5a_en.html "chmod 444" [-r--r--r--] help-sdconfigure-5b_en.html "chmod 444" [-r--r--r--] help-selecturls-3_en.html "chmod 444" [-r--r--r--] help-spyfetcher-6_en.html "chmod 444" [-r--r--r--] help-status-functions-2_en.html "chmod 444" [-r--r--r--] help-status-functions-5_en.html "chmod 444" [-r--r--r--] help-status-functions-6_en.html "chmod 444" [-r--r--r--] sm-tutorial-1_en.html "chmod 444" [-r--r--r--] sm-tutorial-2_en.html "chmod 444" [-r--r--r--] sm-tutorial-3_en.html "chmod 444" [-r--r--r--] sm-tutorial-4_en.html "chmod 444" [-r--r--r--] sm-tutorial-5_en.html "chmod 444" [-r--r--r--] sm-tutorial-6_en.html "chmod 444" [-r--r--r--] sm-tutorial-index_en.html "chmod 444" [-r--r--r--] sm-tutorial-tools_en.html "chmod 444" [-r--r--r--] ====================================================================== UNINSTALLING THE PROGRAM ------------------------ For complete uninstall, delete the following files: shadowmaker.cgi buildcsv.cgi geturls.cgi getcontent.cgi createsite.cgi sdcontrol.cgi submitpages.cgi monitor.cgi SMSetup.pm SMLIB.pm PARSER.pm locs_en.txt trademarks_en.txt company_names_en.txt family_names_en.txt first_names_en.txt miscellaneous_en.txt enginelist.txt fantomas.gif smhelp_en.txt Also, delete the following directories including all content: admin/ admin/admin_parms/ admin/admin_urls/ admin/admin_submissions/ admin/admin_links/ admin/admin_locs/ admin/admin_logs/ tarballs/ tmp/ input/ input/input_keywords/ input/input_contents/ input/input_descriptions/ input/input_exclusions/ helpfiles/ graphics/ Alternate Installation Scenarios -------------------------------- The description given above explains the installation of the generator module and a Shadow Domain(TM) residing on a mutually shared domain. Beyond this, there are alternate real life installation scenarios which are categorized respectively as: CENTRALIZED INSTALLATION and DECENTRALIZED INSTALLATION. All phantom pages can be generated from one single installed instance of the fantomas shadowMaker(TM) generator module. Thus, this module need only be implemented once. In Step #5 "Check + Control Shadow Domain" you can control only those SDs that are actually residing on the same single physical server. If you wish to install SDs on multiple physical server systems, you will need to install at least the Step 5 script (sdcontrol.cgi) on each physical server. If domains are configured in a self-contained manner on any given web server system, you will not be able to open one domain's files from another domain. In such a scenario, you will have to install the Step 5 script (sdcontrol.cgi) on each physical server involved. If the Apache web server is configured to run as "suEXEC enabled", you will only be able to read those files that are assigned to your own domain or the domain from which the system is being accessed. In case you are unsure whether "suEXEC enabled" is configured on your system or not, please inquire either with your hosting provider or your system administrator. You will find more info about suEXEC here: < http://httpd.apache.org/docs/2.0/suexec.html > The distinction between central and decentralized installations also applies to your deployment of the spiderSpy(TM) botBase. Wherever possible, it is recommended to work from one central botBase instance per physical server. Not only will this un-complicate the overall installation and implementation procedure, it will also speed up the regular updating process and improve system performance under high load visitor traffic conditions. However, analogous to the above, if your Apache system is configured with "suEXEC enabled", the botBase will have to be implemented separately for each SD i. e. in a decentralized manner, because under this specific configuration no central file can be read across different domains. The following section outlines in detail which files will have to be installed/implemented under the various scenarios. Decentralized Installation of Step 5 "Check + Control Shadow Domain" ==================================================================== INSTALLATION ------------ The following files are required: sdcontrol.cgi --- (program script) SMSetup.pm --- (Setup file) SMLIB.pm --- (Script routines) PARSER.pm --- (Parsing routines) locs_en.txt --- (Language file) fantomas.gif --- (logo/graphics file) fa_license-e.txt --- (License Agreement And Terms Of Usage - PLEASE READ!) help-filenames-5_en.html --- (online help file) help-param_en.html --- (online help file) help-sdconfigure-5a_en.html --- (online help file) help-spyfetcher-6_en.html --- (online help file) help-status-functions-5_en.html --- (online help file) smhelp_en.txt --- (documentation in TXT format) <--- THIS FILE YOU ARE READING! ----------------------------------- ADJUSTMENTS IN FILE "SMSetup.pm" (please edit in ASCII or plain text editor like Notepad etc.) ----------------------------------- UNIX ---- System Path ----------- * Please check your system's path to location of Perl. The default path in the script is "/usr/bin/perl". If you don't know this path, you can check it out under telnet by entering Unix command "whereis perl". The system path will then be displayed for you to copy if required (see below). If your system path to Perl is not "/usr/bin/perl", you will have to adjust the first line accordingly in the following script: - sdcontrol.cgi Configuration of Script Parameters (Variables) ---------------------------------------------- The following adjustments are related only to the parameter file "SMSetup.pm". * Please adjust the following variables to your requirements: - "$local_lang" - "$cgi_url" - "$from_mail" - "$to_mail" - "$sendmail" - "$doc_dir" (default = /helpfiles/) - "$graphics_dir" (default = /graphics/) - "$abs_admin_dir" A comprehensive description of these variables can be found in chapter: "WORKING WITH fantomas shadowMaker(TM)" FTP Upload Mode --------------- * When uploading via FTP, make sure to transfer ALL files in ASCII mode (including ".pm" files!). EXCEPTION: the graphics file "fantomas.gif" which must be transferred in BINARY or AUTOMATIC mode. Uploading Files To Your Web Server ---------------------------------- * The following scripts and files must be copied into the Unix server's CGI directory: - sdcontrol.cgi - SMSetup.pm - SMLIB.pm - PARSER.pm CGI Directory Permissions ------------------------- * The CGI directory must be endowed with the following permissions: "chmod 755" [drwxr-xr-x] Creating Subdirectories ----------------------- * Next, create the following directories with the permissions "chmod 777" [drwxrwxrwx] BELOW your CGI directory: admin/ admin/admin_parms/ admin/admin_locs/ admin/admin_logs/ * Next, create the following directories with the permissions "chmod 755" [drwxr-xr-x] BELOW your MAIN directory: helpfiles/ graphics/ Uploading the Language File --------------------------- * Copy the following file into the directory "admin/admin_locs/": - locs_en.txt Uploading the Online Help Files ------------------------------- * Finally, copy the following files into the directory "helpfiles/": - help-filenames-5_en.html - help-param_en.html - help-sdconfigure-5a_en.html - help-spyfetcher-6_en.html - help-status-functions-5_en.html - smhelp_en.txt Uploading the Logo/Graphics File ------------------------------- * Finally, copy the following file into the directory "graphics/": - fantomas.gif Assigning Proper File Permissions --------------------------------- * Assign the following required file permissions: sdcontrol.cgi: "chmod 755" [-rwxr-xr-x] SMSetup.pm: "chmod 444" [-r--r--r--] SMLIB.pm: "chmod 444" [-r--r--r--] PARSER.pm: "chmod 444" [-r--r--r--] locs_en.txt: "chmod 444" [-r--r--r--] fantomas.gif: "chmod 444" [-r--r--r--] help-filenames-5_en.html "chmod 444" [-r--r--r--] help-param_en.html "chmod 444" [-r--r--r--] help-sdconfigure-5a_en.html "chmod 444" [-r--r--r--] help-spyfetcher-6_en.html "chmod 444" [-r--r--r--] help-status-functions-5_en.html "chmod 444" [-r--r--r--] smhelp_en.txt: "chmod 444" [-r--r--r--] ====================================================================== UNINSTALLING THE PROGRAM ------------------------ For complete uninstall, delete the following files: sdcontrol.cgi SMSetup.pm SMLIB.pm PARSER.pm locs_en.txt fantomas.gif smhelp_en.txt Also, delete the following directories including all content: admin/ admin/admin_parms/ admin/admin_locs/ admin/admin_logs/ helpfiles/ graphics/ ------------------------------------------------------------------------ Centralized Installation of the fantomas spyFetcher(TM) Script -------------------------------------------------------------- If you cannot work from a central instance of the fantomas spiderSpy(TM) botBase you will have to implement the botBase under each and every SD separately. Accordingly, you are required to implement a dedicated updating process for the fantomas spyFetcher(TM) update script per SD as well. To effect this, install the script spyfetcher.cgi on every SD. This installation is described in detail in the accompanying doc file "sfehelp_en.txt". Note that the basic installation process will remain identical, regardless of whether you are working from a centralized or a decentralized setup, the only difference being the actual number of installations you are required to conduct. ====================================================================== WORKING WITH fantomas shadowMaker(TM) ------------------------------------- Program Description ------------------- fantomas shadowMaker(TM) - This power tool offers fully automatic creation of unlimited Shadow Domains for highly effective and efficient IP delivery (cloaking): starting with automatic generation of topical fillertext content through unlimited generation of cross linked phantom pages (10+K per hour if you need that many) including navigation elements and site maps to automatic submission to the search engines. 6 Steps Overview ---------------- To generate a Shadow Domain with fantomas shadowMaker(TM), please follow the 6 steps outlined below. First, activate the MAIN MENU (file: shadowmaker.cgi) where you will find a selection of functions as well as a link to the online tutorial. Your starting point is a LIST OF TARGETED KEYWORDS/SEARCH PHRASES. This is utilized in a multi-step process to create phantom pages optimized for these keywords/search phrases, embedded in a topically relevant semantic environment These pages are pure "spider fodder" as they will only ever be viewed by search engine spiders. That is why their featured text is not required to make "sense" to a human reader. In fact, they may look like total gibberish! However, as the fantomas shadowMaker(TM) generates only semantic text components ("words") which relate to the targeted search phrases, the pages will look highly relevant to any machine reading them – which is exactly what a search engine spider is. Once these pages are ranked well in the search engines, visitors following the search engines' query results (i.e. clicking on your links) will be redirected via the Shadow Domain to your primary or Core Domain. NOTE: Please take the utmost care when compiling your list of targeted keywords and search phrases as it is quite fundamental to the whole process of creating your Shadow Domain and generating highly optimized phantom pages and, hence, your search engine rankings! Step 1 ------ In Step 1 you will define the search terms you want to target as well as the landing pages your human visitors shall be redirected to. Typically, your visitors will arrive at your Shadow Domain (SD) by clicking a pertinent link in a search engine's result page (SERP). The search phrases your visitors actually entered for their search engine queries will normally be included in their browser's "referrer" variable. This referrer variable will be parsed to determine which search phrases may be included. Depending on these keywords/search phrases, visitors will be redirected to predefined pages on your CD. Thus, you can define which target URL should be assigned to which keyword/search phrase. This part of Step 1 is optional. Should you decide to skip it, human visitors will be redirected by default from the SD to one single URL (the default URL) on your CD, e.g. the index page. Step 2 ------ Based on your list of keywords/search phrases, the program will determine the most important URLs pertinent to these keywords as displayed by the major search engines within their Top 30 search results. For every keyword/search phrase in your list, the search engines' SERPs are called up to extract the first 30 URLs. Finally, a list of the thousand most frequently referenced URLs is stored in a file for further processing. Step 3 ------ Next, the URLs determined in Step 2 will be spidered. In a very extensive process their content will be stripped of HTML tags, sanitized, scrambled and sorted and, finally, stored. This is important in order to avoid copyright infringements and to generate highly relevant fillertext for your SD's phantom pages. The actual mechanics involved in processing this content are described in more detail below. At the end of this process, you will have generated a highly topical fillertext content file. This file's size will typically be several MBs. To ensure optimal results, you will have to check this fillertext content file and modify it by deleting inappropiate content where detected. Step 4 ------ Next, your SD's pages will be generated. This process combines input from three different sources: 1. your list of keywords/search phrases 2. a list of page descriptions (to be generated manually) 3. the fillertext content file generated under Step 3. fantomas shadowMaker(TM) will now generate one dedicated phantom page per keyword/search phrase in your list. Keywords/search phrases will be integrated into their respective phantom pages' TITLE tag and the phantom page's FILE NAME. Moreover, they will be included in the pages' BODY TEXT at a predefined keyword density configurable by you. The phantom page's body text will be constituted by phrases randomly extracted from the extensive fillertext content generated under Step 3 as described above. Thus, there will never be any direct correlation between a spidered URL's content and the phantom page generated during this process. All phantom pages generated by this randomized process will be unique. This avoids the issue of working with duplicate content, a practice generally frowned upon by the search engines, leading to suboptimal rankings. The theme or topic predetermined by your list of keywords/search phrases will be fed into the phantom pages from your extensive fillertext content file. In this manner, not only will you get keyword density optimized phantom pages, they will also be topically relevant. The phantom pages created in this manner will be cross linked amongst each other to prevent generation of orphaned pages as search engines will not rank isolated pages well. Step 5 ------ Before the automatic search engine submission can be effected, the phantom pages will have to be UPLOADED TO THE SHADOW DOMAIN. This can be worked either with a tarball (compressed file) generated in Step 4, or by uploading the individual pages by either FTP or via Telnet. On your SD, a "central keyword switch" script will be installed. Every visitor arriving at your SD (regardless of which specific page is actually being hit) will be checked by this script. If the script determines that your visitor is a human surfer, said visitor will be redirected to your CD. Should the visiting entity be a search engine spider, it will be fed the phantom page's content. (No redirect for search engine spiders!) After implementing the Shadow Domain you should conduct a functionality test. This is done during Step 5. Step 6 ------ After successful testing of your Shadow Domain's functionality under Step 5, you can now configure automatic submission of your Shadow Domain's pages in this step. Detailed Description -------------------- The following section describes the scripts constituting the fantomas shadowMaker(TM) program in detail. Each script features an administrative section covering parameter files. It is displayed at the top of the respective HTML templates. This administrative block consists of three parts. 1. Button "Save parameter file" ------------------------------- Whenever a fantomas shadowMaker(TM) script is called up for the first time, this field is empty by default. The field will display a file name only after a parameter file has been generated. The name of the parameter file used last will be stored so you can work on it whenever calling the script another time. By clicking this button, you will save the current parameter file. The name is displayed in the field "Current parameter file". 2. Button "Create new parameter file" ------------------------------------- All variables are stored in a parameter file. Hence, this file should be created first. You can enter the parameter file's name in the field to the left of button "Create new parameter file". E.g.: project_varparm_sm.txt By clicking this button, you will create and save the new paramter file. 3. "Select parameter file" -------------------------- If you choose to use different parameter files for various projects, you can select the relevant file via the drop down menu and load it by clicking the button "Select parameter file". ------------------------------------------------------------- Step 1: Build keyword list as a CSV database (buildcsv.cgi) ------------------------------------------------------------- 1. Field "Project name" ----------------------- You can enter any project name of your choosing in this field, e.g. the Shadow Domain's or your client's name, etc. 2. Field "Keyword flat file (name)" ----------------------------------- The function of the script "buildcsv.cgi" is to assign individual landing pages to your chosen keywords/search phrases. You can determine your list of keywords/search phrases either by reading it from a file or by entering the terms in the text area field "Keywords list". If you wish to work from a dedicated text file, you can enter the file's name (without its system path) in the field "Keyword flat file (name)" and check the box "Load keywords from keyword flat text file". Before proceeding in this manner you must upload the file to the directory "input/input_keywords/". 3. Field "Keyword CSV file (name)" ---------------------------------- Landing pages are assigned to individual keywords and the resulting list will be stored in CSV format (comma delimited). Enter the file's name (without its system path) in this field. 4. Text area "Keywords list" ---------------------------- If you don't want to load the keyword/search phrase list from a separate file, you can cut and paste or type them into this text area instead. Each keyword must be entered in a separate line. Example: keyword1 keyword2 keyword3 search phrase1 search phrase2 search phrase3 You may enter an unlimited number of keywords/search phrases. Keywords/search phrases may not be repeated. You may, however, enter different versions, e.g.: online travel online travel booking online travel bookings where do i find cheap online travel bookings ... etc. 5. Button "Save + continue" --------------------------- If you have checked "Load keywords from keyword flat text file", you will first load these keywords into the text area "Keywords list" by clicking on button "Save + continue". You may still edit or delete keywords/search phrases at this stage. If you click the button "Save + continue", all entries in the text area field will be stored in a flat text file, whereupon the next HTML template will be loaded where you can assign a landing page to each keyword/searchphase. If you have not checked "Load keywords from keyword flat text file", you will save your keywords list in the keyword flat text file by clicking the button "Save + continue", provided you have entered a file name in the field "Keyword flat file (name)". Now, the next HTML template will be loaded where you can assign a landing page to each keyword/searchphase. Second Page - Assigning Landing Pages ------------------------------------- This page displays the predefined keywords/search phrases in the left column. You may now enter the individual landing page URLs you wish to assign to each keyword/search phrase in the right column. These are the pages your human visitors who have entered the pertinent keyword/search phrase in the search engines will be redirected to. Example: keyword1 --- http://www.yourdomain.com/product1.html keyword2 --- http://www.yourdomain.com/product2.html keyword3 --- http://www.yourdomain.com/product3.html ... etc. If you don't enter a specific landing page URL in a right column row, the Standard URL will be assigned to the keyword/search phrase in question. This Standard URL will later be defined in the central Keyword Switch File when setting up the Shadow Domain. You must enter a minimum of one URL in the right column. If you want to redirect all traffic to the Standard URL, regardless of which keywords/search phrases the search engine user entered, you may skip Step 1. ------------------------------------------------------------------ Step 2: Determine web pages to mine for fillertext (geturls.cgi) ------------------------------------------------------------------ This script parses the search engines' results pages and fetches the URLs relating to your keywords/search phrases. The keywords/search phrases used are either read from a flat file keywords/search phrases list or from the Keywords CSV file generated under Step 1. 1. Field "Project name" ----------------------- You may enter any project name of your chosing in this field. For example, this could be the name of the Shadow Domain, a client's name, etc. 2. Field "Keyword flat file (name)" ----------------------------------- If you want the program to read your list of keywords/search phrases from a flat file, enter the file's name (without its system path) in this field. File format: ------------ The keywords must be entered one per line. Example: keyword1 keyword2 keyword3 You may enter an unlimited number of keywords. Blank lines and any lines commented out with # will be skipped. Please create this file manually in ASCII mode with a plain text editor. Then, upload it by FTP in ASCII mode (this is critical!) to your server directory: "input/input_keywords/" 3. Radio button "Use keyword flat text file" -------------------------------------------- If you wish to use a keyword flat file, enter the name of the file (without its system path) in field "Keyword flat file (name)" and check the radio button "Use keyword flat text file". 4. Field "Keyword CSV file (name)" ---------------------------------- If you generated a Keyword CSV file in Step 1, enter the file's name in this field. Should you proceed directly from Step 1 to Step 2 without changing the parameter file, the file name chosen in Step 1 will be used. 5. Radio button "Use keyword CSV file" -------------------------------------- To have the program read your list of keywords/search phrases from the Keyword CSV file, check this radio button. 6. Field "URL list file (name)" ------------------------------- The collected URLs will be stored in this file. Enter the file name (without its system path) in this field. This file will be stored automatically in directory "admin/admin_urls/". 7. Checkbox "Send E-Mail Notification" -------------------------------------- Depending on the number of keywords/search phrases chosen, the time used for process "Get URLs" may vary considerably. Here's an indication of what you may expect: If you want the program to check and list the URLs for 50 keywords at 10 search engines, the process will take approximately 1 hour. If you wish to be notified by e-mail once the process is completed, check the box marked "Send E-Mail Notification". The script will then dispatch a notification to the e-mail address you specified in the setup file "SMSetup.pm" under variable "$to_mail". Status Message: "Current configuration: max xxxx URLs." ------------------------------------------------------- This status message indicates the maximum number of URLs the program will fetch from the search engine results pages to analyze them. This value (default: "1000") is defined in the setup file SMSetup.pm as variable "$max_url". While technically well versed users could customize this value themselves, this is not supported: bear in mind that search engines take a dim view of automated queries because of the massive, expensive bandwidth they tend to eat up. Hence, to avoid engines blocking your server IP for possible bandwidth wastage, we strongly recommend sticking to the default value of max. 1000 URLs which will normally suffice nicely to generate sufficiently varied, relevant fillertext content for your Shadow Domain's phantom pages. Selecting Search Engines ------------------------ In the lower section of the HTML template you can select the search engines to parse. You can toggle "select all/none" with a single click on the respective radio buttons. Provided your browser's JavaScript function is enabled, you may also use the button "Check All/Uncheck All". 8. Button "Get URLs" -------------------- After entering all required data, you can initialize the process by clicking button "Get URLs". Initialization of process will be confirmed in a new template. "Get URLs" will be handled by a new UNIX process which will run in your system's background. Hence, it is not required to sustain online connection to your web server or monitor the procedure. Instead, you may dedicate this time to other activities. ----------------------------------------------------------- Step 3: Get fillertext for Shadow Domain (getcontent.cgi) ----------------------------------------------------------- Based on the URLs determined in the previous step, Step 3 will now build the required fillertext content. To this effect, the selected URLs will be spidered, their content whacked, sanitized, jumbled, sorted and stored in a single large content file. 1. Field "Project name" ----------------------- You may enter any project name of your chosing in this field. For example, this could be the name of the Shadow Domain, a client's name, etc. 2. Field "URL list file (name)" -------------------------------´ The URLs collected in Step 2 were stored under this file name in directory "admin/admin_urls/". This file will now be used for input. Enter the file name (without its system path) here. 3. Field "Content file (name)" ------------------------------ This file will contain the spidered, whacked, sanitized and processed fillertext content. Enter the file name (without its system path) in this field. The file will be stored in directory "input/input_contents/". 4. Selection field "Language" ----------------------------- When spidering URLs, the pages' meta tags are analyzed for body text language indicators. If the meta tag section indicates another body text language than the one chose with this drop down menu, the URL will be skipped and no content will be retrieved. Currently, the supported languages are English and German. Example: -------- Your language of choice is "English". If a meta tag on the spidered page indicates languages German or French, the URL will be skipped. If no language meta tag is found on the spidered page, language analysis is terminated and the content will be whacked for further processing. 5. Button "Select URLs" ----------------------- Clicking this button will display the next HTML template where you can edit your URL list. Editing the URL list file ------------------------- The URL list is displayed with checkboxes. This is the list of URLs the fantomas shadowMaker(TM) spider determined as relevant to your list of keywords/search phrases. Fillertext content will be whacked from these URLs for further processing to serve as a semantic base for your Shadow Domain's phantom pages. You can unselect any URLs you do not want to use for this process by clicking its pertinent checkbox. After selecting/unselecting URLs, you can save the updated file by clicking button "Save URL list file". Tip: After saving the modified URL list file in this manner, the page will refresh to display the remaining selected URLs. If the list is very long, you may want to save modifications occasionally in between. That way, should you experience a system crash, you won't lose all your work. Exclusion Lists - General Outline --------------------------------- Next, a block covering the exclusion lists will be displayed. To sanitize the whacked content, all HTML tags and script sections will be removed, as will all URLs and e-mail addresses featured on the whacked page. You can further sanitize the fillertext by excluding specific content (words, phrases, etc.) via the exclusion lists. This will typically cover trademarks, competitor names, navigation text, four letter words, etc. The program comes with 5 ready made exclusion lists. You may edit, modify or expand these lists individually to suit your needs. 6. Button "Edit all Exclusion Lists" ------------------------------------ By clicking on this button a new HTML template will be displayed where you can edit all exclusion lists on a single page. Note: Depending on the size of the exclusion lists, this page may take quite long to load! 7. Button "Trademarks" ---------------------- The Trademarks exclusion list contains trademarks you wish to exclude from your Shadow Domain fillertext to avoid undesirable association with your own products and services within search engine results pages. Note: While this list comes ready made during installation, it makes no claims to being comprehensive! Depending on the particular industry you are targeting and your particular jurisdiction's legal framework, you may want to edit this list extensively to accomodate your specific requirements and to comply with local laws and regulations. If in doubt, please consult with qualified legal counsel. 8. Button "Company Names" ------------------------- The Company Names exclusion list contains company names you wish to exclude from your Shadow Domain fillertext to avoid undesirable association with your own products and services within search engine results pages. Note: While this list comes ready made during installation, it makes no claims to being comprehensive! Depending on the particular industry you are targeting and your particular jurisdiction's legal framework, you may want to edit this list extensively to suit your specific requirements and to comply with local laws and regulations. If in doubt, please consult with qualified legal counsel. 9. Button "Family Names" ------------------------ The Family Names exclusion list contains family names you wish to exclude from your Shadow Domain fillertext to avoid undesirable association with your own products and services within search engine results pages. 10. Button "First Names" ------------------------ The First Names exclusion list contains first names you wish to exclude from your Shadow Domain fillertext to avoid undesirable association with your own products and services within search engine results pages. 11. Button "Miscellaneous" -------------------------- The Miscellaneous exclusion list contains words and phrases you wish to exclude from your Shadow Domain fillertext to avoid undesirable association with your own products and services within search engine results pages. This list will typically include offensive semantic material such as four letter words and ambigous terms possibly pointing to a different type of industry than the one you are actually targeting as well as anything else not covered by the four other exclusion lists described above. By clicking one of the above buttons, you will be guided to a new HTML page where you can edit the respective exclusion list. 12. Checkbox "Send E-Mail Notification" --------------------------------------- Depending on the number of URLs chosen, the time used for process "Get Content" may vary considerably. Here's an indication of what you may expect: If you want the program to spider 500 URLs, the process will take approximately 1 hour. If you wish to be notified by e-mail once the process is completed, check the box marked "Send E-Mail Notification". The script will then dispatch a notification to the e-mail address you specified in the setup file "SMSetup.pm" under variable "$to_mail". 13. Button "Get Content" ------------------------ After entering all required data, you can initialize the process by clicking button "Get Content". Initialization of process will be confirmed in a new template. "Get Content" will be handled by a new UNIX process which will run in your system's background. Hence, it is not required to sustain online connection to your web server or monitor the procedure. Instead, you may dedicate this time to other activities. Editing Fillertext Content Manually ----------------------------------- After finalization of this process, the generated content file should be checked and edited further manually. To do this, proceed as follows: 1. Copy the file from directory "input/input_contents" to your local system by FTP or Telnet. 2. Next, open the file with a plain text editor such as Notepad. 3. Delete unwanted terms, names, etc. not covered by the exclusion lists from the fillertext content file. Tip: To reduce later workload, you may want to add material tagged manually for general exclusion to your exclusion lists by going online, chosing Step 3 from the fantomas shadowMaker(TM) Main Menu and selecting the appropriate exclusion list. You can then cut and paste anything you want to see excluded from later fillertext content generation into the exclusion list of your choice and save the result. 4. After manually editing the fillertext content file, upload it again via FTP or Telnet to your web server into directory "input/input_contents", overwriting the previous version. Template "Select URLs" ---------------------- This page displays all URLs included in the URL list. The pertinent checkboxes are all preselected by default for your convenience. Thus, to unselect a URL, click on its checkbox. This allows you to exclude specific URLs from spidering. After you have finished the URL selection process, click on button "Save URL list file" so save the newly edited URL list. Template "Edit all exclusion lists" ----------------------------------- This template offers you the option of editing your exclusion lists. Terms targeted for exlusion must be entered one per line. Example (for: Company names exclusion list): -------------------------------------------- fantomaster.com AOL IBM Microsoft MSN You can enter an unlimited number of words. Blank lines and lines commented out with # will be skipped. Tip: By entering the hash symbol # as the first character, you may segment your exclusion lists by adding comments for later reference, etc. Take care to restrict comments to single lines or you will have to add another # for each following line in order to avoid the comment to be treated as part of the exclusion material! Exclusion lists are NOT case sensitive! I.e. "Somecompany", "SOMECOMPANY" and "somecompany" will be treated as being identical. This cuts on processing times and saves you the trouble of having to enter exclusion content in various cases to cover all possibilities. NO STEMMING, ABSOLUTE CHARACTER STRINGS ONLY - NO WILDCARDS ALLOWED! If you add a term like "somecomp*", the program will only exclude exactly that: "somecomp*". It will NOT cover "somecompany", "somecompanies", etc. You may save the lists individually by clicking the buttons "Save trademarks list", "Save company names list", etc. Alternatively, you can save all lists in one go by clicking "Save all exclusion lists". Warning: -------- Be aware that if you delete the content of a list in the text area fields completely and click on "Save" you will effectively delete the file's complete content. This will leave you with a blank list file. Template "Edit individual exclusion lists" ------------------------------------------ This template will only display the preselected individual exclusion list. After editing, the exclusion list can be saved by clicking on button "Save exclusion list file". ------------------------------------------------------- Step 4: Generate Shadow Domain pages (createsite.cgi) ------------------------------------------------------- In Step 4, you can finally generate the optimized phantom pages which will constitute your Shadow Domain. Toggle button "Select bulk mode" / "Select standard mode" --------------------------------------------------------- By default, "Standard mode" is activated. In this mode, only one single domain job run will be processed at any given time. By contrast, in "Bulk mode" you can either process multiple runs per domain or even generate phantom pages for an unlimited number of multiple domains. (This is limited only by your storage capactiy.) To accomodate the parameters required for proper processing in bulk mode, you will find differing entry fields compared to standard mode. These entry fields will now be discussed in the order in which they appear on the GUI. 1. Field "Project name" ----------------------- You may enter any project name of your chosing in this field. For example, this could be the name of the Shadow Domain, a client's name, etc. 2. Field "Keyword flat file (name)" ----------------------------------- If you want the program to read your list of keywords/search phrases from a flat file, enter the file's name (without its system path) in this field. File format: ------------ The keywords must be entered one per line. Example: keyword1 keyword2 keyword3 You may enter an unlimited number of keywords. Blank lines and any lines commented out with # will be skipped. Please create this file manually in ASCII mode with a plain text editor. Then, upload it by FTP in ASCII mode (this is critical!) to your server directory: "input/input_keywords/" 3. Radio button "Use keyword flat text file" -------------------------------------------- If you wish to use a keyword flat file, enter the name of the file (without its system path) in field "Keyword flat file (name)" and check the radio button "Use keyword flat text file". 4. Field "Keyword CSV file (name)" ---------------------------------- If you generated a Keyword CSV file in Step 1, enter the file's name in this field. Should you proceed directly from Step 1 to Step 2,3,4 without changing the parameter file, the file name chosen in Step 1 will be used. 5. Radio button "Use keyword CSV file" -------------------------------------- To have the program read your list of keywords/search phrases from the Keyword CSV file, check this radio button. Writing Descriptions -------------------- Take great care when writing your descriptions as they will be featured right at the top of your pages' body text - this means that many search engines will display them in their search results. Two obvious exceptions to this rule are Google which will only display text snippets, and, as of recently, FAST/Alltheweb: this engine will display text snippets and the descriptions included in the META Description section. 6. Field "Description file (name)" ---------------------------------- Enter the name of the file containing your page descriptions here. File format: ------------ The descriptions must be entered one per line. Example: description1 description2 description3 Please create this file manually in ASCII mode with a plain text editor. Then, upload it by FTP in ASCII mode (this is critical!) to your server directory: "input/input_descriptions/" 7. Field "Content file (name)" ------------------------------ This file contain the spidered, whacked, sanitized and processed fillertext content. Enter the file name (without its system path) in this field. This file was generated during the previous step and was stored in directory "input/input_contents/". 8. Field "Submission list file (name)" ------------------------------------ Once the phantom pages have been generated, a list of URLs is created which will later be submitted to the search engines. Enter the submission file name (without its system path) in this field. It will be stored in directory: "admin/admin_submissions". 9. Radio button "Append to existing list" or "Overwrite existing list" ---------------------------------------------------------------------- This radio button allows you to define whether the URLs are to be appended to an existing list or whether the existing list should be overwritten. It is strongly recommended not to restrict the generation of phantom pages to one per keyword/search phrase. Rather, we suggest generating several phantom pages per keyword/search phrase in varying keyword density and/or file lengths. In this case your obvious choice would be "Append to existing list". 10. Field "Links list file (name)" -------------------------------- Provided you are working from a CSV file, generating the phantom pages will also automatically trigger creation of a file defininig which landing page human visitors will be redirected to, depending on the keyword/search phrase they used to find your Shadow Domain in the search engine. Enter the file name (without its system path) in this field. It will be stored in directory: "admin/admin_links". You will find further details on how to use the links list below. 11. Field "Temporary output directory" -------------------------------------- The generated phantom pages will be stored in a separate directory. Enter the directory name in this field. Phantom Pages - General Outline ------------------------------- Phantom pages are ordinary static HTML pages that derive their name from the fact that they are not intended for human perusal. They are intendend for search engine spiders only who will crawl them. By contrast, human visitors will be redirected in realtime to your main or Core Domain. Input wise, phantom pages are generated from predefined: - Keywords/Search Phrases - Descriptions - Fillertext The fantomas shadowMaker(TM) program can generate an unlimited amount of unique phantom pages by randomizing the choice of fillertext which, on the phantom page proper, constitutes the body text, to be augmented/"inoculated" by your predefined keywords/search phrases to your predefined keyword density. This is the basic structure of a phantom page: HEAD - title - META Keywords - META Description BODY - description - cross links - random content (processed to feature keywords/search phrases in predefined density) 12. Field "Shadow domain (http://...)" -------------------------------------- Enter your Shadow Domain's full URL in this field. E.g.: http://www.yourshadowdomain.com/ This entry will be required for building the URLs in your search engine submission list. 13.1 Checkbox "Generate index page(s) only" -------------------------------------------- Select this option e. g. if you want to generate fresh index pages in a subsequent run. This will allow you to optimize your index page(s) for additional keywords, etc. 13.2 Checkbox "Overwrite existing index page(s)" ------------------------------------------------ Select this option to overwrite any existing index page(s) generated in a previous run. If this checkbox is not selected with previously generated index pages on your system, the following error message will pop up: "There are already existing index pages. Please choose overwrite before you start creating new ones." To remediate this situation, return to Step 4 and select the overwrite option. 14.1 Field "Title of index page" -------------------------------- Enter a fixed title for your Shadow Domain's index (home) page in this field. You may optionally select "Choose keyword of index page" here. (More on this function further below.) Now, if you enter a title with the placeholder symbol "#" here, e. g. "We offer great # stuff!", the placeholder will be replaced by the pertinent keyword or keyword phrase during page generation. If you don't select the option "Choose keyword of index page", the placeholder will be removed. In the example above, this would simply generate an index page with the text "We offer great stuff!" in the title tag. 14.2 Text area "Body text of index page" ---------------------------------------- By default, the body text of your Shadow Domain's index (home) page is exempt from randomization and must be entered in this text area field. IN STANDARD MODE: If you select option "Choose keyword of index page" but fail to enter text for "Body text of index page" in the textarea field, the index page's body text will be generated randomly from the fillertext file as for any other phantom page. The program will also include a selected keyword or search phrase into this page's body text. (See pursuant description.) 14.3 Checkbox "Choose keyword of index page" -------------------------------------------- When selecting this option, the index page will be optimized for a keyword or search phrase. As the program cannot know of itself which particular keyword or search phrase you want to optimize the index page for, you are offered three possible options to choose from. 14.4 Radiobox "Select randomly from keyword file" ------------------------------------------------- Select this option if you want the keyword or search phrase to be selected randomly from the standard keyword file as defined under either "Keyword flat file (name)" or "Keyword CSV file (name):" above. (Choice of keyword file will depend on which of these two options you have selected.) 14.5 Radiobox "Keywords for random selection" (with text area) -------------------------------------------------------------- This option allows you to manually define a separate list of keywords or search phrases for your index page only. Each keyword must be entered in a separate line. Example: -------- keyword1 keyword2 keyword3 search phrase1 search phrase2 search phrase3 14.6 Radiobox "Additional keyword flat file (path/name) for random selection (with entry field) ----------------------------------------------------------------------------------------------- As a third "Choose keyword of index page" option, you can have the program pull your keyword from a dedicated plain text file. Define this file's name in the entry field. If you enter a file name only, the file in question must reside in directory "input/input_keywords". Else, please define the full system path along with the file name. 15. Field "Title of phantom pages" ---------------------------------- The phantom page titles will be generated according to a fixed mechanism, integratng the relevant keyword/search phrase with a text to be predefined by you in this field. The keyword/search phrase will be represented by a place holder. Example: -------- Best # here We offer # Find infos about # During phantom page generation, the place holder # will be replaced by the keyword/search phrase relevant to the phantom page in question. Thus, the final titles in our example might look like the following: Best online casinos here We offer security solutions Find infos about search engine optimization Tip: For optimum ranking and traffic generation results, we recommend using short, snappy titles of no more than two to max. four terms (plus keywords/search phrases). 16. Field "Text above cross links" ---------------------------------- On the phantom page, following the page description right at the top of the BODY section, selected cross links to other phantom pages will be listed. By default, the first two links will point to the Shadow Domain's index page and the sitemap (which will be generated automatically), followed by 4 additional links selected randomly. This cross linking mechanism ensures that all phantom pages are cross linked multiple times across the whole Shadow Domains, thus avoiding possibly orphan pages which are viewed unfavorably by the search engines. You can optionally include further text above the cross links section. (E. g. "printer friendly copy", or similar.) 17. Field "Text under cross links" ---------------------------------- Text optionally entered in this field will be added immediately below the cross links section. (E. g. "printer friendly copy", or similar.) 18. Field "Max. links per sitemap" ---------------------------------- If generating a large number of phantom pages for a domain, it is recommended not to feature all internal links on a single sitemap page. (In an attempt to weed out mere link farms, some search engines will limit the number of links their spiders may follow per given page.) This parameter lets you define the maximum of links to be featured on your domain's sitemap pages. In accordance with this parameter, the program will create multiple sitemap pages which will then be linked to in aggregate from a top sitemap page. To accomodate very large numbers of links, the program may also generate multiple layered sitemap top pages. 19. Checkbox "Include Title attribute for links" ------------------------------------------------ Check this box if you want to include title attributes for the links on your sitemap page(s). This will create title tag entries in the following format (example): key phrase< /a> 20. Field "Content text weight (kB)" ------------------------------------ In this field you can determine the size of the randomized body text to be included in the phantom pages. Note that the actual file size will be somewhat larger due to additional overhead being added by HTML tags and JavaScript code. Tip: When generating several phantom pages per targeted keyword/search phrase in varying keyword densities (recommended), we suggest you also experiment with varying content text sizes for optimum ranking results. 21. Field "TARGkd" ------------------ Enter the targeted keyword density in this field. To achieve the predefined keyword density, the phantom page body text is basically processed as follows: First, any occurences of predefined listed keywords/search phrases possibly already included by chance in the fillertext will be deleted and replaced by place holders. This avoids the issue of inadvertently increasing the density of any given keyword/search phrase beyond the predefined mark. Next, the place holder will be replaced by the keyword/search phrase desired for the page in question. Finally, the text will be processed further to fine tune the exact keyword density to the desired value. 22. Checkbox "Include Cache Buster" ----------------------------------- If you want to cloak for Google, you should implement a cache buster (JavaScript code) on every phantom page redirecting human viewers to the target URL. Note that this won't provide you with 100% security - if someone turns off JavaScript functionality in their browser, they will still be able to see your phantom page code! However, this is the next best option to being stigmatized by Google for using their own proprietary NOARCHIVE tag. If you select the checkbox "Include Cache Buster", a JavaScript section wiil be included in every phantom page with the respective target URL. 23. Checkbox "Define optional Meta Tags" ---------------------------------------- Checking this option will open additional field in which you can define additional meta tags. Deselecting this box will remove these additional fields. 24. Checkbox "Include NO-ARCHIVE meta tag (NOCACHE)" ---------------------------------------------------- By including this meta tag you can prevent Google and other search engine caching your spidered phantom pages and displaying the page content via the "cached version" function on their search engine results pages (SERPs). (This is recommended.) 25. Checkbox "Include additional keywords (Meta tag "keywords")" ---------------------------------------------------------------- By default, only the specific keyword or search phrase for which any given phantom page is optimized, will be featured in the "keywords" meta tag. By checking this option you can include additional keywords in this meta tag. 26. Radiobox "Additional keywords" (with entry field) ----------------------------------------------------- Here you can define additional keywords to be included in every phantom page's "keywords" meta tag throughout the generated domain. Note that multiple keywords must be separated by commas! (No blanks required.) 27. Radiobox "Keywords for additional random inclusion" (with text area) ------------------------------------------------------------------------ If you don't want to include a fixed list of keywords, you can instead define a list of keywords in this text area from which the program will randomly select additional keywords to be included within the "keywords" meta tag. Each keyword must be entered in a separate line. Example: keyword1 keyword2 keyword3 search phrase1 search phrase2 search phrase3 28. Radiobox "Keywords file (path/name) for additional random inclusion" (with entry field) ------------------------------------------------------------------------------------------- As a third "keywords" meta tag option, you can have the program pull your keywords from a plain text file. Define this file's name in the entry field. If you enter a file name only, the file in question must reside in directory "input/input_keywords". Else, please define the full system path along with the file name. 29. Field "Number of additional keywords" ----------------------------------------- Define here the number of additional keywords you want to have included in your pages' meta tags. 30. Checkbox "Generate meta tag 'author'" ----------------------------------------- By selecting this option, the "author" meta tag will be included to define the phantom page's author. This can help in creating a more "organic" look-and-feel for your phantom pages. 31. Radiobox "Content" (with entry field) ----------------------------------------- Define the content for the "author" meta tag in this field. 32. Radiobox "Content for random selection" (with text area) ------------------------------------------------------------ For more variety, you can define multiple entries for the "author" meta tag here. From these, the program will randomly select one entry per phantom page. Each author must be entered in a separate line. Example: author1 author2 author3 33. Checkbox "Generate meta tag 'generator'" -------------------------------------------- By selecting this option, the "generator" meta tag will be included to define the phantom page's generator. This can help in creating a more "organic" look-and-feel for your phantom pages. 34. Radiobox "Content" (with entry field) ----------------------------------------- Define the content for the "generator" meta tag in this field. 35. Radiobox "Content for random selection" (with text area) ------------------------------------------------------------ For more variety, you can define multiple entries for the "generator" meta tag here. From these, the program will randomly select one entry per phantom page. Each generator must be entered in a separate line. Example: generator1 generator2 generator3 36. Checkbox "Generate meta tag 'content language'" --------------------------------------------------- As a final meta tag option, selecting this box lets you define your pages' content language. 37. Field "Content" ------------------- Enter the language code here. This should be consistent with your page's actual language. Some sample language codes: en en-US en-GB de fr ------------------------------------------------------------------------- File name structure for phantom pages - STANDARD MODE (SINGLE JOB/DOMAIN) [See below for a detailed outline of Bulk Mode operation.] ------------------------------------------------------------------------- In this section, the file name structure is configured. By default, phantom page file names are generated following this basic scheme: keyword-01-01.html Example: -------- keyword/search phrase = "search engine optimization" >>>>>>>>>>>> file name = "search-engine-optimization-01-01.html" In this example, the individual components of the search phrase are separated by a "-" (hyphen). The two-digit numbers at the end of the file name constitute administrative indicators: - first digit pair "01": indicates that this is the first phantom page generated for this particular keyword/search phrase - second digit pair "01": indicates that this is the first phantom page generation job for this particular Shadow Domain Thus, if you run a second job generating fresh phantom pages for the same set of keywords/search phrases, but with a different keyword density, you would receive the following file name: "search_engine_optimization-02-02.html" Consequently, adhering to our example, a third run, adding another keyword/search phrase ("search engines optimization"), would generate: "search_engines_optimization-01-03.html" When conducting ranking checks later on, after the phantom pages have been submitted to the search engines and indexed, it will be easy to determine from the file name and the log file (for which see below) which parameters were employed to achieve a particular ranking. This in turn will facilitate further fine tuning of your SEO strategy for optimal results. 38. Textarea "File name phrasal separators" ------------------------------------------- - In textarea "File name phrasal separators", to separate the constituent elements of a multi-term search phrase (as opposed to a single term keyword) you may define the characters to be used. Valid characters are: Aa-Zz,0-9,-_+,.;!~ If you should happen to define some invalid characters by mistake, they will automatically be removed from the list during processing. By default, the text area comes populated with the following characters after installation: - _ + , . ; ! ~ You may add or delete characters from this standard list at your leisure. During phantom page generation, the program will select random separators from whichever valid characters are included in this field. This allows for more flexibility and uniqueness ("de-patternizing") when assigning file names to your phantom pages. 39. Textarea "Separators Keyword/Search Phrases – Definers" ----------------------------------------------------------- - In textarea "Separators Keyword/Search Phrases – Definers", you may similarly define the characters separating your keywords/search phrases from their suffixed file name definers (e. g. "-01-01" in default mode). All other rules (e. g. re valid characters, etc.) outlined above apply in full to this textarea field as well so please see above for details. 40. Checkbox "Further customize file name structure as defined below" --------------------------------------------------------------------- Selecting this box allow you to further customize your file name structure beyond the default values explained above. When activated, file names are construed from three basic definer elements. Example: Definer Element 1: page Definer Element 2: _ Definer Element 3: eg2a3b You can configure each of these elements separately within the following fields. 41. Radiobox "File name definer element 1" (with entry field) ------------------------------------------------------------- Selecting this option will generate file names with a set (fixed) file name element 1. 42. Radiobox "Randomize file name definer element 1 with" (with text area) -------------------------------------------------------------------------- This option allows for multiple strings to be used for element 1. You may enter these strings within the text area field. By default, some example strings are included in the text area. You may use, modify, delete or expand this list in accord with your requirements. 43. Radiobox "File name definer element 2" (with entry field) ------------------------------------------------------------- Selecting this option will generate file names with a set (fixed) file name element 2. 44. Radiobox "Randomize file name definer element 2 with" (with text area) -------------------------------------------------------------------------- This option allows for multiple strings to be used for element 2. You may enter these strings within the text area field. By default, some example strings are included in the text area. You may use, modify, delete or expand this list in accord with your requirements. Note that syntactically invalid entries not in accord with web file naming standards will be removed automatically before processing. 45. Selectbox "File name definer element 3" ------------------------------------------- This box gives you a choice of options for file name element 3: - Letters (lower case) only - Letters (mixed case) only - Letters (upper case) only - Numerals + letters (mixed case) - Numerals only 46. Radiobox "Fixed no. of chars definer element 3" (with entry field) ---------------------------------------------------------------------- Selecting this option and defining an integer will generate file names with a set (fixed) number of characters for file name element 3. 47. Radiobox "Randomize no. of chars file name definer element 3. Range" (with entry fields) -------------------------------------------------------------------------------------------- Select this radiobox to define a range of number of characters to be used for file name element 3 from which the program will choose a random value. 48. Button "View examples of file names" ---------------------------------------- Click this button to see 5 examples of file names generated according to your specifications in the fields and boxes above. This allows you to make final amendments and modifications if required before actually triggering the phantom page generation process proper. 49. Checkbox "Include RSS Feed(s)" ---------------------------------- By popular demand, the program offers integration of RSS feeds into all, select or a randomized number of generated phantom pages. The advantage of this powerful feature is that it helps keep your phantom page content fresh automatically: provided the specified RSS feed is functional and updated regularly, a search engine spider will usually detect new content every time it crawls your phantom page. And, of course, search engines are known to prefer fresh, regularly updated pages. On the downside however, note that integrating RSS feeds will impair precision of your configured keyword density and text weight values for that given page! To make the most of both options (i. e. with/without RSS feeds), we strongly recommend our customary "buckshot" approach by selecting "include randomly in every ... nth page", with "..." being any integer greater than 1. This way you will have a mixed assortment of phantom pages both featuring and not featuring RSS feeds for optimal results. LEGAL NOTICE + DISCLAIMER ------------------------- Unlike the fillertext generated by the program, RSS feeds will be pulled from their respective host servers and will be presented "as is", i. e. the feeds are not modified or sanitized in any way. This may raise legal issues concerning fair usage, potential copyright infringement, etc. PLEASE MAKE SURE TO SELECT ONLY RSS FEEDS THAT ARE EITHER IN THE PUBLIC DOMAIN OR IF YOU EITHER HAVE THE RIGHTS HOLDER'S EXPRESS PERMISSION TO DO SO OR IF YOU YOURSELF ARE THE LEGAL COPYRIGHT HOLDER! IN MOST JURISDICTIONS, WILLFUL COPYRIGHT INFRINGEMENT IS CONSIDERED AND TREATED AS A CRIMINAL OFFENSE AND MAY THUS INCUR GRAVE PENALTIES INCLUDING IMPRISONMENT AND HEFTY CIVIL DAMAGES CLAIMS! NEITHER FANTOMASTER.COM PGMBH AND ITS STAFF MEMBERS NOR ANY OF ITS SUBSIDIARIES OR MARKETING PARTNERS CONDONE OR ENDORSE COPYRIGHT INFRINGEMENT AND WILL NOT BE HELD RESPONSIBLE FOR ANY LEGAL CONSEQUENCES AND CLAIMS RESULTING FROM ABUSE OF THIS PROGRAM FEATURE. 50. Radiobox "Insert at beginning of body text" ----------------------------------------------- The RSS feed will be inserted at the beginning of a phantom page. 51. Radiobox "Insert at end of body text" ----------------------------------------- The RSS feed will be inserted at the end of a phantom page. 52. Radiobox "Insert on home/index page only" --------------------------------------------- The RSS feed will be inserted in the index/home page only. 53. Radiobox "Insert in all pages" ---------------------------------- The RSS feed will be inserted in all phantom pages. Note that this may consume considerably more bandwidth as the RSS feeds will first be spidered before being integrated in the phantom pages! 54. Radiobox "Randomize insertion in every nth page" (with entry field) ----------------------------------------------------------------------- The RSS feed will be inserted randomly only in every nth page (integer to be defined by you). This can reduce dramatically the number of pages featuring RSS feeds. 55. Textarea "RSS Feed URLs" ---------------------------- Define the RSS feeds you want to include in your phantom pages here, one entry per line. Example: RSS Feed URL 1 RSS Feed URL 2 RSS Feed URL 3 The program will then select a random RSS feed from this list when generating phantom pages. 56. Button "Check RSS Feeds" ---------------------------- All RSS feeds must be validated before this function can be used. To this effect, the listed feeds are run through the FEED Validator program at "http://www.feedvalidator.org/" and the results will be displayed in a fresh browser window. NOTE THAT THIS STEP IS MANDATORY IF YOU OPT FOR RSS FEED INCLUSION! ------------------------------------------------------------------------- File name structure for phantom pages - BULK MODE (MULTIPLE JOBS/DOMAINS) [See above for an outline of Standard Mode operation!] ------------------------------------------------------------------------- Note: The following section features fields only displayed when Bulk Mode is selected. For easier reference, they have been numbered separately with a "BM-" prefix. For all other fields identical in Standard and Bulk Mode, please refer to the documentation above. ------------------------------------------------------------------------- BM-1. Textarea "Shadow domain(s) (http://...)" ---------------------------------------------- If you want to generate multiple Shadow Domains(TM) in one single job, list your domains in this text area field, one per line. Once you have pre-configured all required fields, clicking the button "Generate phantom pages" will trigger an automatic process generating all pages for every domain in your list. Note that, depending on your number of keywords and domains, this process may take several hours to complete! ----------------------------------------- BM-2.1 Textarea "Titles of index page(s)" ----------------------------------------- You may add one or more titles in this textarea field. If generating multiple domains, it is strongly recommended to define a variety of distinct "Title of index page" values, preferably one per domain! After selecting the option "Choose keyword of index page", if you enter a title with the placeholder symbol "#" in this field (e. g. "We offer great # stuff!"), the placeholder will be replaced by the pertinent keyword or keyword phase during page generation. Note, however, that if you HAVEN'T selected the option "Choose keyword of index page" prior to entering index page body text content in this textarea field, the placeholder will be removed. (In the example above, this would simply generate an index page with the simple text "We offer great stuff!" in the title tag.) BM-2.2 Checkbox "Body text of index page" (with text area) ---------------------------------------------------------- By default, the index pages' body text will be generated randomly. However, if you wish to preselect a definitive text for your index pages, you may enter it manually here. By selecting this checkbox, the program will process the text entere here instead of generating random body text contnet for your index pages. For greater variety and further optimization, the placeholder symbol "#" is supported under this function. If you want to take advantage of it, make sure you have selected the option "Choose keyword of index page", defining the exact source of the keywords to be used for this specific, index pages only generation process. If you include multiple "#" symbols in the body text, they will be replace by randomly selected keywords or search phrases from your specified list. Note, however, that if you HAVEN'T selected the option "Choose keyword of index page" prior to selecting this function, the placeholder will be removed. BM-3. Textarea "Title of phantom pages" --------------------------------------- You may list multiple titles in this textarea field, one per line. For a detailed explanation of the title structure, please see description under Field "Title of phantom pages" in the documentation above. BM-4. Textarea "Text above cross links" --------------------------------------- You may optionally list multiple text entries (one per line) to be featured above the cross links section of your phantom pages. For a detailed explanation of this feature, please see description under Field "Text above cross links" in the documentation above. BM-5. Textarea "Text under cross links" --------------------------------------- You may optionally list multiple text entries (one per line) to be featured under the cross links section of your phantom pages. For a detailed explanation of this feature, please see description under Field "Text under cross links" in the documentation above. BM-6. Radiobox "Max. links per sitemap" (with entry field) ---------------------------------------------------------- You may set a fixed number of maximum links per sitemap page here. For a detailed explanation of this feature, please see description under Field "Max. links per sitemap" in the documentation above. BM-7. Radiobx "Randomize no. of sitemap links. Range" (with entry fields) ------------------------------------------------------------------------- Instead of a fixed set number of links per sitemap you may define a range of link numbers (min/max) here from which the program will randomly select a value. BM-8. Radiobox "Use identical text weight (kB) for all domains" (with entry field) ---------------------------------------------------------------------------------- In this field you can determine the size of the randomized body text to be included in the phantom pages. Note that the value defined here will be used throughout ALL domains generated. BM-9. Radiobox "Randomize text weight (kB) by domain. Range" (with entry fields) -------------------------------------------------------------------------------- Instead of a fixed set value for text weight you may define a range of weights here from which the program will randomly select a value. Note that this option will randomly set a uniform text weight value for all pages within a single given domain - thus, if generating multiple domains each domain will have a different uniform text weight value for its pages. BM-10. Radiobox "Use identical TARGkd for all domains" (with entry fields) -------------------------------------------------------------------------- Select this radiobox to define an identical fixed targeted keyword density throughout ALL domains to be generated. BM-11. Radiobox "Randomize TARGkd by domain. Range" (with entry fields) ------------------------------------------------------------------------ Instead of a fixed set value for targeted keyword density you may define a range of values here. Note that this option will randomly set a uniform keyword density value for all pages within a single given domain - thus, if generating multiple domains each domain will have a different uniform keyword density value for its pages. BM-12. Radiobox "Randomize file name structure (all pages/all domains)" ----------------------------------------------------------------------- Select this option to fully randomize the phantom pages' file name structure across all pages and domains to be generated. Note that this will create domains with varying, i. e. not uniform page file name structures. BM-13. Radiobox "Randomize file name structure (uniform by domain)" ------------------------------------------------------------------- Select this option to randomize the phantom pages' file name structure across all domains to be generated. Note that this will create domains with a uniform page file name structure but varying from one domain to the other. BM-14. Numerical Field "No. of runs per domain" ----------------------------------------------- This option allows you to trigger multiple runs per domain, thereby generating several unique pages per keyword/search phrase for enhanced variability and greater exposure. Note that this field will apply to your total job configuration. E. g. if you decide to create 3 pages per keyword/search phrase with a randomized target keyword density etc., and if you have prescribed 10 domains to be generated, all 10 domains will feature three phantom pages per keyword/search phrase. -------------------------------- Checkbox "Background Processing" -------------------------------- If you generate lots of phantom pages, it is recommended to opt for background processing by checking this box to minimize on system resources. Example: -------- If your list of targeted keywords/search phrases contains 1000 entries, the program will take approximately (depending on various factors such as hardware, system load, traffic, etc.) 5 minutes to finalize the process. However, most systems will sever the connection between web server and browser after about 5 minutes, in which case the script cannot be closed properly. It is therefore recommended to check this option "Background Processing" from a minimum of appr. 500 keywords/search phrases to be on the safe side. "Generate phantom pages" will be handled by a new UNIX process which will run in your system's background. Hence, it is not required to sustain online connection to your web server or monitor the procedure. ----------------------------------- Checkbox "Send E-Mail Notification" ----------------------------------- If you wish to be notified by e-mail once the process is completed, check the box marked "Send E-Mail Notification". The script will then dispatch a notification to the e-mail address you specified in the setup file "SMSetup.pm" under variable "$to_mail". ------------------------------- Button "Generate phantom pages" ------------------------------- By clicking on this button, you will create and save the phantom pages. ============================= Setting Up Your Shadow Domain ============================= The Shadow Domain proper can be set up in various manners. 1. A tarball (a tar.gz packed archive file) is generated containing all phantom pages. (See further below for the details.) This file is then uploaded to the Shadow Domain proper via FTP or Telnet. There are two possible ways to do this: a) directly, by logging in to your Shadow Domain and by downloading the file from your temporary output directory using whatever FTP client is available, e.g. WU-FTP - please consult your FTP client's documentation to learn how to do this as we cannot cover such basics here. or b) indirectly, by downloading the file to your local system first and uploading it to the Shadow Domain from there. On the Shadow Domain, log in via Telnet, switch into your Shadow Domain directory, e.g. by typing: cd /usr/www/htdocs/shadowdomain Next, you will have to unpack the archive file. Enter the following command: tar -xz -f project.tar.gz Explanation: ------------ function: x extract files from an archive option: z uncompress data with gzip f use archiv file The file will now be unpacked into the Shadow Domain's Document Root (main) directory. 2. Download all files in the temporary output directory to your local system and upload them from there to the Shadow Domain via FTP. 3. For your temporary output directory, select the Shadow Domain's main directory. This ensures that all generated phantom pages will be stored right where they belong. Note that to achieve this, you will have to set the Shadow Domain's main directory's permissions to: "chmod 777" [drwxrwxrwx] We recommend either method #1 or #2 as this will enable you to test the whole procedure within a contained environment. Option #3 will probably be the method of choice for the more experienced user. 42. Field "Tarball file name" ----------------------------- Define the tarball (a tar.gz packed archive file) file name (without system path) in this field. The generated tarball(s) will be stored in directory "tarballs". Standard Mode: In Standard Mode, the character string defined here will be treated as an absolute name. Thus, entry "project-1.tar.gz" will give you a single tarball file named "project-1.tar.gz". Bulk Mode: In Bulk Mode, any character strings entered here will be combined with your respective Shadow Domain(TM) names to create unique, easy to distinguish archive files. Thus, if you define e. g. "project-1.tar.gz" in this field, tarballs generated may be named "project-1-shadowdomain1.com.tar.gz", "project-1-shadowdomain2.net.tar.gz", etc. 43. Field "Only files newer than" --------------------------------- When generating further phantom pages for an existing Shadow Domain at some later point, you may opt to create a tarball containing only these newly created pages. To do this, enter a date shortly before generation of your new phantom pages in this field. Date syntax: 2006-02-01 12:30:00 Now, upload this new, smaller tarball to the Shadow Domain and unpack it as outlined above. 44. Button "Generate tarball" ----------------------------- By clicking on this button, you will create and save the tarball(s). 45.1 Button "Display Submission URLs" ------------------------------------- From version 2.02.01, by default, the submission URLs list will no longer be displayed in the text area "Submission URLs" unless specifically called for. This is for performance reasons: when working from an extended list, it can, on some systems, take very long to actually display these URLs. As the list will typically only be edited once, it is more convenient and efficient to preclude it from uncalled-for display. To view this list on its dedicated page, click button "Display Submission URLs". 45.2 Text area "Submission URLs" -------------------------------- Upon request (see item 45.1 above), the newly generated phantom pages will be displayed in this window. These are the URLs which will also be saved in the submissions list file for submission to the search engines. 46. Text area "Logs" -------------------- For each job during which phantom pages are created, a log entry is written into a dedicated log file and displayed in this text area window. This allows you to check the various parameters specified during the generation process, enabling you to further fine tune the overall SEO process at a later point in time after conducting a thorough ranking analysis and success evaluation of your Shadow Domain's phantom pages. Example: -------- 2006-02-01, 02:14 -- Project Sports Site #1 -- http://www.myshadowdomain.com/ -- Job 02 -- keywords.txt -- descriptions.txt -- sportscontent.txt -- 5 kB -- 1.5 % -- 2 skips This sample log excerpt displays: - the date and time the logged job was conducted (i.e. "2006-02-01, 02:14") - the Project Name (i.e. "Project Sports Site #1") - the Shadow Domain for which phantom pages were generated (i.e. "http://www.myshadowdomain.com/") - the number of the job run for this particular Shadow Domain (i.e. "Job 02") - the name of the keyword file used (i.e. "keywords.txt") - the name of the description file used (i.e. "descriptions.txt") - the name of the fillertext content file used (i.e. "sportscontent.txt") - the size of the body text generated (i.e. "5 kB") - the keyword density specified (i.e. "1.5%") - the number of skips specified (i.e. "2") The log file's name is: "log.txt" It is located in directory: " admin" ------------------------------------------------------- Step 5: Check + Control Shadow Domain (sdcontrol.cgi) ------------------------------------------------------- After creating the phantom pages and crosslink files for your Shadow Domain in Step 4, and after uploading them to the SD, you will now want to implement the "Central Keyword Switch" (CKS) file "X.cgi". ------------------------------------- ADJUSTMENTS IN FILE "X.cgi" (please edit this file in ASCII i. e. a plain text editor like Notepad etc. only!) ------------------------------------- System Path ----------- * Please check your system's path to location of Perl. The default path in the script is "/usr/bin/perl". If you don't know this path, you can check it out under telnet by entering Unix command "whereis perl". The system path will then be displayed for you to copy if required (see below). If your system path to Perl is not "/usr/bin/perl", you will have to adjust the first line accordingly in the following script: - X.cgi Configuration of Script Parameters (Variables) ---------------------------------------------- The following adjustments are related exclusively to the Central Keyword Switch" file "X.cgi". * Please adjust the following variables to your requirements: - "$standard" Variable "$standard" denotes the core domain you want to redirect your "normal" visitors to (i.e. NO machines, searchbots, etc.) - "$keyword_flag" As a rule, human visitors will enter some keyword or search term on a search engine's main page and will then click on a listed phantom page's link which will transfer them to that page's Shadow Domain(TM). By defining variable "$keyword_flag" as 1, the visitor's search term can be included as an info field in the standard URL. This information can then be used in statistical analysis of your Core Domain's traffic. If you opt for this feature, the "$standard" variable's syntax required some adjustment. Two examples: $standard = "http://www.coredomain.com/index.html?<>" or: $standard = "http://www.coredomain.com/affiliate.cgi?keyword=<>" Thus, the character string <> is included at some position within the URL. The exact form of the URL depends on the manner in which it will be analyzed on the Core Domain, i. e. which script is assigned this task. Default: the variable "$keyword_flag" is commented out, i. e. not active. - RSS Feed Inclusion - "use LWP::Simple;" This call will integrate Perl module "LWP::Simple" in the overall process. - "use XML::RSS;" This call will integrate Perl module "XML::RSS" in the overall process. Both modules are included in any standard installation of Perl 5. Should one or both be missing from your system's, it will require installation as the RSS Feed functionality will not work otherwise. (Note that this function is not mandatory: the overall fantomas shadowMaker(TM) will work fine, too, if you choose not to make any use of it.) - "$rss_flag" Set this variable to "1" if you want to include an RSS feed in any of your phantom pages. - "$rss_items" This variable lets you determine how many items of the RSS feed shall be included in your phantom pages. If this variable is commented out, all the RSS feed's items will be included. However, this is not recommended as it can dramatically blow up the phantom pages' size. By default, all these 4 variables are commented out, i. e. the RSS Feed Inclusion feature is not activated. - "$main_dir" Main directory (DocumentRoot) for your HTML pages. The absolute path is required. Examples: "/usr/www/htdocs/" "/var/www/html_public/" - "$stats_dir" Directory for log files and admin files. The absolute path is required. Examples: "/usr/www/htdocs/cgi-bin/stats/" "/var/www/html_public/cgi-bin/stats/" - "$hits_log_file" Log file listing SD hits (Default name is: "hits.log") - "$humans_log_file" Log file listing SD hits from human visitors (Default name is: "human-hits.log") - "$links_list_file" Links list file name as generated in step 4 (Default name is: "links.txt") - "$selist_file" Search engine referrer parsing routines (Default name is: "selist.txt") - "$botbase_dir" Directory of fantomas spiderSpy(TM) botBase file. The absolute path is required. Examples: "/usr/www/htdocs/cgi-bin/stats/" "/var/www/html_public/cgi-bin/stats/" - "$botbase_file" File containing spider robots list (Default name is: "spiderspy.txt") The ".htaccess" file -------------------- The file ".htaccess" should include as a minimum the following entries: RewriteEngine on Options +FollowSymlinks RewriteBase / RewriteRule ^$ /cgi-bin/X.cgi?%{REQUEST_URI} [L] RewriteCond %{REQUEST_URI} !/.*/ RewriteRule ^.*\.html$ /cgi-bin/X.cgi?%{REQUEST_URI} [L] However, it is recommended to expand the .htaccess file by including the following condition: RewriteCond %{REQUEST_FILENAME} -f Explanation: This check whether the called page actually exists on the domain. If not, a 404 error message will be triggered. If you don't include this condition, the page index.html will be displayed instead. Thus, the complete expanded code is: RewriteEngine on Options +FollowSymlinks RewriteBase / RewriteRule ^$ /cgi-bin/X.cgi?%{REQUEST_URI} [L] RewriteCond %{REQUEST_FILENAME} -f RewriteCond %{REQUEST_URI} !/.*/ RewriteRule ^.*\.html$ /cgi-bin/X.cgi?%{REQUEST_URI} [L] (A sample ".htaccess" file featuring these entries is included with our package.) This .htaccess file offers two functionalities: 1. All calls for HTML pages, be they search engine spider or human generated (web browsers) will first be redirected to the Central Keyword Switch (CKS) 2. All HTML pages in subdirectories wiil be displayed as normal HTML pages Uploading Files to Your Web Server ---------------------------------- * The following file must be copied into the Shadow Domain's main directory ("DocumentRoot"): - .htaccess - robots.txt A generic robots.txt file is included in our package which permits all spiders to crawl the phantom pages. The use of a robots.txt file is not mandatory. * The following script must be copied into the directory for execution of CGI scripts. Usually, this will be directoy /cgi-bin/. - X.cgi Creating Subdirectories ----------------------- * Next, create the directories defined as the following variables BELOW your main directory: - "$stats_dir" - "$botbase_dir" Set directory permissions to: "chmod 777" [drwxrwxrwx] Typically, the two variables above will point to the same directory. This, however, is not mandatory. E.g. if you wish to implement several Shadow Domains on a single server, you might want to feed the fantomas spiderSpy(TM) botBase from a central directory while storing the individual Shadow Domains' log files decentrally in dedicated directories. Uploading the SE List File -------------------------- * Now, copy the following file into the directory defined under variable "$stats_dir": - selist.txt Uploading the Links List File ----------------------------- * Next, copy the following file into the directory defined under variable "$stats_dir": - links.txt This will only be required if you actually generated a matching Links List file during Step 4. Uploading empty Log Files ------------------------- * Now, copy the following blank files into the directory defined under variable "$stats_dir": - hits.log - human-hits.log Uploading the fantomas spiderSpy(TM) botBase -------------------------------------------- * Finally, copy the following file into the directory defined under variable $botbase_dir: - spiderspy.txt FTP Upload Mode --------------- * When uploading via FTP, make sure to transfer ALL files in ASCII mode (including the ".htaccess" file!). This is quite critical as about 90% of all installation problems are related to incorrect upload modes! Assigning Proper File Permissions --------------------------------- * Assign the following required file permissions: .htaccess: "chmod 444" [-r--r--r--] X.cgi: "chmod 755" [-rwxr-xr-x] spiderspy.txt: "chmod 666" [-rw-rw-rw-] hits.log: "chmod 666" [-rw-rw-rw-] human-hits.log: "chmod 666" [-rw-rw-rw-] selist.txt: "chmod 444" [-r--r--r--] links.txt: "chmod 444" [-r--r--r--] UNINSTALLING THE CKS --------------------- For complete uninstall, delete the following files: .htaccess X.cgi Also, delete the following directory or whichever directory you defined under variables "$admin_dir" and "$botbase_dir" including all content: - stats WARNING ------- Deinstallation should always include the whole Shadow Domain! E.g. if you were to delete the Central Keyword Switch (CKS) only, the phantom pages could be read by any human visitor and no redirection to your Core Domain would be effected. Following deinstallation, you may also want to adjust your ".htaccess" file, restore it to its previous version, delete it altogether or whatever may be most pertinent to your system setup. Functionality of the Central Keyword Switch (CKS) ------------------------------------------------- All visitors' IP addresses will be checked by the CKS. If found belonging to a search engine spider, the phantom page will be read internally and fed to the spider. In this case, no redirection will take place and the spider will not notice the difference: it will crawl and index the phantom page just like any other web page. If no established search engine spider IP is detected, the visitor's Referrer data will be parsed for keywords/search phrases. If keywords/search phrases are found for which redirection instructions have been defined in the Links List, the visitor will be redirected to the predefined target URL. If no Referrer is detected, or if no specific target URL has been defined for the keywords/search phrases found, the visitor will be redirected to the defined standard URL. All hits are logged in the log file "hits.log". Search engine spider spider hits are marked by two preceding exclamation marks: "!!". The file "human-hits.log" logs only hits from human visitors and spiders not assigned to search engines. (The latter may include whackers, extractor bots, etc.) Check + control your SD infrastructure -------------------------------------- By calling "sdcontrol.cgi" from your web browser you can test and control the organization of phantom pages on your web site. 1. Field "Project name" ----------------------- You may enter any project name of your chosing in this field. For example, this could be the name of the Shadow Domain, a client's name, etc. 2. Field "Shadow domain (http://...)" ------------------------------------- Enter your Shadow Domain's full URL in this field. E.g.: http://www.yourshadowdomain.com/ This entry will be required for building the URLs in the text area "Your stealth URLs". 3. Field "Main directory" ------------------------- Enter your Shadow Domain's main directory. You may enter either an absolute or a relative path, e.g.: absolute: /usr/www/htdocs/yourshadowdomain/ relative: ../../yourshadowdomain The relative path starts with the directory in which the script "sdcontrol.cgi" is located. Obviously, defining an absolute path will be the less complicated procedure. This entry will be required for getting the filenames of the URLs in your text area "Your stealth URLs". 4. Field "Hits log file (path/name)" ------------------------------------ Enter path and name of your log file listing Shadow Domain hits. You may enter either an absolute or a relative path, e.g.: absolute: /usr/www/htdocs/yourshadowdomain/stats/hits.log relative: ../../yourshadowdomain/stats/hits.log 5. Field "botBase file (path/name)" ----------------------------------- Enter path and name of your fantomas spiderSpy(TM) botBase file. You may enter either an absolute or a relative path, e.g.: absolute: /usr/www/htdocs/yourshadowdomain/stats/spiderspy.txt relative: ../../yourshadowdomain/stats/spiderspy.txt 6. Button "Refresh" ------------------ By clicking this button, you can reload your log file. 7. Field "Your current I.P. address" ------------------------------------ In the middle part of the template you will see your current IP address (e.g. "123.156.7.111" or similar). This IP is also displayed in the field "Test IP". 8. Field "Hits since" --------------------- This field displays the date from which hits are calculated. 9. Field "Logfile size in bytes" -------------------------------- This field displays the logfile size. 10. Button "Set Test Mode" -------------------------- By clicking this button, you will store the IP in the botBase file. 11. Button "Reset Test Mode" ---------------------------- By clicking this button, you will delete the IP from the botBase file and restore the previous version. Description of Functionality Check ---------------------------------- (Replace the name "yourshadowdomain.com" with the name of the Shadow Domain you are actually testing. Similarly, replace the name "fantomas.html" with the name of the phantom page you are actually testing.) In your web browser, enter: < http://www.yourshadowdomain.com/fantomas.html > If you are redirected to the Core Domain specified when configuring and adapting the "central keyword switch", the setup is probably ok, depending on the next test. For this, click button "Refresh" and review the entries in the log file. If you cannot find an entry now, the configuration isn't right yet. Please review it before continuing! If you do find an entry, click button "Set Test Mode" to store your current IP (as entered in the field "Test IP") into the botBase file. Next, in your web browser call up the following again: < http://www.yourshadowdomain.com/fantomas.html > If you can see the phantom page now, the installation is ok. (Note that the phantom page will typically be quite ugly - this is fine because it is not intended for human perusal anyway. This is how optimized "spider fodder" will normally appear.) Now, delete your own current IP from the botBase list by clicking button "Reset Test Mode". Only after these tests have been conducted successfully, should you submit phantom pages to the search engines in Step 6! 12. Button "Flush Log File" --------------------------- The log file's content can be deleted from the control interface. To cut on download time, we suggest you delete the log file regularly, e.g. (depending on your system's capacity) after 80-100 entries. We strongly recommend downloading the log file (e.g. via FTP) regularly to your local system for offline evaluation. The display of data files in their windows can be configured by setting the following variables in the configuration script "SMSetup.pm": $wraplog = wrap logfile display (default is: "virtual"); may alternatively be set to "off". $wrapurls = wrap urls list display (default is: "off"); may alternatively be set to "virtual". As different web browsers will convert the "wrap" command in textarea fields in different manners, the visual display can be customized in this manner. ---------------------------------------------------------- Step 6: Submit pages to search engines (submitpages.cgi) ---------------------------------------------------------- 1. Field "Project name" ----------------------- You may enter any project name of your chosing in this field. For example, this could be the name of the Shadow Domain, a client's name, etc. 2. Field "Submission list file (name)" -------------------------------------- When starting the script submitpages.cgi in batch mode, it will read the submission URLs from a predefined text file. Enter the submission file name (without its system path) in this field. This file was generated during step 4 and was stored in directory: "admin/admin_submissions/". If you want to use your own ready-made file with submission URLs, please proceed as outlined below. The submission URLs must be entered one per line. Example: http://www.yourshadowdomain.com/page1.html http://www.yourshadowdomain.com/page2.html http://www.yourshadowdomain.com/page3.html http://www.yourshadowdomain.com/page4.html http://www.yourshadowdomain.com/page5.html Blank lines and lines commented out with # will be skipped. Please create this file manually in ASCII mode with a text editor. Then, upload it by FTP in ASCII mode to the directory: "admin/admin_submissions/". IMPORTANT ========= Set file permissions to: "chmod 666" [-rw-rw-rw-] After execution, the batch job will look similar to the following example: # 2002-10-07, 20:08 ok: http://www.yourshadowdomain.com/page1.html # 2002-10-07, 20:08 not ok: http://www.yourshadowdomain.com/page2.html: Not Found 404 # 2002-10-07, 20:08 ok: http://www.yourshadowdomain.com/page3.html http://www.yourshadowdomain.com/page4.html http://www.yourshadowdomain.com/page5.html The entries "ok" or "not ok" relate solely to availability of the submission URLs. The detailed messages generated by the search engines will be listed in the e-mail submission report. The last two URLs in our example given above will be submitted during the batch job's next run. 3. Text area "URLs (format: http://...)" ---------------------------------------- If you want to work in online mode, enter the URLs you want to submit into text area "URLs (format: http://...)". Each URL must be entered in a separate line. Example: http://www.yourshadowdomain.com/page1.html http://www.yourshadowdomain.com/page2.html http://www.yourshadowdomain.com/page3.html http://www.yourshadowdomain.com/page4.html You can enter an unlimited number of URLs. It is recommended to limit submissions to 5 URLs per domain and day as some search engines may ignore or even penalize larger numbers of submissions in the course of their spam prevention measures. 4. Text area "Select search engines!" ------------------------------------- You can select search engines via mouse click in the upper selection field "Select search engines!" If you wish to select the whole array at once, click on button marked "Toggle" (Defaults to ALL/Single engines). See more info on this button below. Repeated clicking of this button will toggle between selection of all and single search engines. After toggling, you can select single targeted search engines. Multiple selections are effected by pressing the CTRL key simultaneously with mouse click. E-mail Addresses ---------------- 5. Field "E-mail for submission:" --------------------------------- In field "E-mail for submission:" enter the valid e-mail address you want to submit to search engines. Example: webmaster@YourShadowDomain.com Some search engines (e.g. Alltheweb/FAST and MSN) will require this address to finalize submission. Field must be filled in even for submission to search engines which do not specifically demand an e-mail address. 6. Field "Report via e-mail to:" -------------------------------- In the following field "Report via e-mail to" enter the e-mail address the submission report shall be sent to. Example: YourName@YourShadowDomain.com If you have checked the "Send Report" box, this entry is MANDATORY as a report will be dispatched with every submission! 7. Field "cc:" -------------- In the field "cc" you may enter a second e-mail address (optional). Example: YourClient@ClientsDomain.com This feature allows you to send an additional report to those clients in whose behalf your are submitting pages to search engines, to another company department, etc. 8. Field "Subject for e-mail report" ------------------------------------ Enter subject line content for the e-mail report in field "Subject for e-mail report". After the submission process has been finalized, you will receive a submission report by e-mail. This report will consist of various attachments: an overview of submission results and - in case of failed submissions - the error messages generated by the search engines. 9. Field "Attachment name" -------------------------- In the field "Attachment name" enter the attachment name for the submission results overview. 10. Field "Attachment extension" -------------------------------- In the field "Attachment extension" you can customize the extension (file suffix) for submission results reports sent as e-mail attachments. E.g.: srr, txt, rep, etc. Make sure to enter the extension without period (dot), i.e. "txt", not ".txt"! 11. Checkbox "Append Date" -------------------------- If you wish to include the current date in your e-mail report's subject line and in the report's file name, check the box marked "Append Date". Example: Subject for e-mail report: shadowMaker: Submission Results Results file name: sM- File extension: .html After checking the box "Append Date", this will give you: Subject: shadowMaker: Submission Results 2002-10-07 Filename: sM-2002-10-07.html This feature improves easy management of your submission reports. 12. Checkbox "Check URLs" ------------------------- Checking the last box marked "Check URLs" will make the program check all URLs you wish to submit for availability before the submission process is started. 13. Button "ALL/SINGLE engines/Toggle" -------------------------------------- Provided your browser's JavaScript function is enabled, you may make use of the button "ALL/SINGLE engines/Toggle" to select the engines to submit to. 14. Radio button "Send Full Report" ----------------------------------- If you wish to receive a submission report by e-mail, including the relevant search engine's message page (in HTML format) for every unsuccessful submission, check the radio button marked "Send Full Report". 15. Radio button "Send Summary Report" -------------------------------------- If you wish to receive the submission overview report only (by e-mail), check the radio button marked "Send Overview Report". 16. Radio button "Send No Report" --------------------------------- If you don't wish to receive the submission report by e-mail, check the radio button marked "Send No Report". 17. Checkbox "Save Results File" -------------------------------- Checking the box marked "Save Results File" will save the submission results overview to a file on your server. 18. Field "Results file name" ----------------------------- In the field "Results file name" enter the file name for the submission results overview. This file will reside in the directory: "admin/admin_submissions/". Required file permissions: "chmod 666" [-rw-rw-rw-] 19. Button "Submit URLs" ------------------------ After entering all required data, you can initialize the submission process by clicking button "Submit URLs". Initialization of submission process will be confirmed in a new template. If you wish to continue with additional submissions, click button "back to fantomas shadowMaker". This will lead you back to the main template. The submission will be handled by a new UNIX process which will run in your system's background. Hence, it is not required to sustain online connection to your web server or monitor the procedure. Instead, you may dedicate this time to other activities. Submission Report by E-mail --------------------------- After submission of URLs to selected search engines, you will receive a digest by e-mail (HTML attachment) at the address you specified. This report lists both successful and unsuccessful submissions. For every unsuccessful submission, the relevant search engine's message page is (in HTML format) attached to the e-mail report for your perusal. To save on bandwidth, graphics files will not be transmitted. Reasons for unsuccessful submission may vary: the search engine's server is down; search engine will only accept one URL per day; search engine will only accept submission of root domain; etc. 20. Button "Customize" ---------------------- For the experienced user only! By clicking the button "Customize" below the search engine selection field, you will be guided to the maintenance template. Here, you may modify online search engines or add new entries to the list. Maintenance of Search Engines List ---------------------------------- For the experienced user only! You can customize the search engine submission strings to accomodate changes, add new engines, etc. The following Syntax is mandatory, e.g.: §§§§§§ engine = Lycos search = http://www.lycos.com referer = http://www.lycos.com/addasite.html submit = http://www.lycos.com/cgi-bin/spider_now.pl?query=[URL]&e-mail=[EMAIL] ack = We successfully spidered your page Explanation: §§§§§§ Marks new entry. engine = Name of search engine. Freely configurable. search = URL of search engine. Required for submission. referer = URL of submission page. Required by some engines. submit = URL of search engine's submission script. The expressions [URL] and [EMAIL] are placeholders for the URL and the e-mail address to be submitted. The straight brackets [ and ] MUST BE INCLUDED! For search engines that don't require an e-mail address you may leave out expression [EMAIL]. ack = Text string of acknowledgment page as displayed by search engine after successful submission. This text must match exactly the displayed message. Recommendation: avoid umlauts and extended characters. The equal sign "=" must be set as in the example above. Blanks before or after equal sign are optional. You may enter commentary lines beginning with "#". These will be skipped when executing the program. E.g.: # This is a sample commentary. To save your modifications, click button "Save!". To return to main template, click button "back to fantomas shadowMaker". You may, of course, edit the search engine list offline and upload it later by FTP. For search engines which require the HTTP GET method for calling the submission script, the variables above can be parsed from the URL. However, search engines employing the HTTP POST method won't transfer their parameters via the URL. In these cases, parameters such as e.g. "url" and "e-mail" can be parsed from the HTML page's source code. Thus, a submission URL can be construed. Example: submit = http://www.searchengine.com/submit.cgi?url=[URL]&e-mail=[EMAIL] fantomas shadowMaker(TM) will split this URL up again internally and will then feed the parameters to the search engine in question. BATCH MODE CONFIGURATION EDITOR ------------------------------- You can manage your search engine submissions automatically by defining a cron job. For this, you need to define the batch job's parameters prior to setting up the cron job. The Batch Mode Configuration Editor will assist you in generating a configuration defining the selected search engines to submit to, the e-mail addresses to deliver submission reports to, and further template variables. All these variables are saved in interactive mode in a parms file. You can define an unlimited number of different configurations for batch mode. They will all be stored in different parms files. 21. Field "Cron configuration file name" ---------------------------------------- Define the name of the file containing the cron job's parameters in the field marked "Cron configuration file name". The script will automatically add the extension (suffix) ".cron.txt" to this defined file name. Example: Entry = "cron.parm1" Extended file name = "cron.parm1.cron.txt" 22. Field "Number of URLs to submit" ------------------------------------ In the field marked "Number of URLs to submit" you can limit the number of URLs to be submitted during any given submission cycle. Example: Entry = "5" This will limit the number of URLs submitted per batch job to 5. It is recommended to limit submissions to 5 URLs per domain and day as some search engines may ignore or even penalize larger numbers of submission in the course of their spam prevention measures. If you configure your cron job to run several submissions times per day, please take this into consideration when limiting the number of overall submissions. This number relates to "submissions per batch job run", not to submissions per day! 23. Checkbox "Submit unlimited number of URLs" ---------------------------------------------- If you don't want to limit the number of submissions, check the box marked "Submit unlimited number of URLs". This will cause the script to submit all URLs listed in the "Submission list file" in one single go. Saving Your Configuration ------------------------- 24. Button "Save Batch Mode Configuration" ------------------------------------------ You can store the configuration values by clicking button "Save Batch Mode Configuration". The parameters will be written into the file you specified under "Cron configuration file name:". The files containing your batch mode configuration parameters will be stored in the directory "admin/admin_submissions/". Once the files have been created and saved, they will be displayed in the lower part of the template. Listed to the right you will see the pertinent "Command line strings". Before starting a cron job, please test the batch mode first! To do this, you will access your server by Telnet and enter the required "Command line strings" in the command line. The easiest and most comfortable way to do this is by cutting and pasting the string, provided your Telnet client offers this functionality. If everything works out ok, you will receive the submission report by e-mail, and the "Submission list file" will be modified accordingly as outlined above. Now you can proceed with installing the cron job. ====================================================================== CONFIGURATION OF CRON JOBS -------------------------- Cron is a mechanism for planning and scheduling batch jobs. The daemon "crond" is started automatically on system boot up. It runs one check per minute to see if there are any jobs to execute. The list of jobs to execute is created by the program "crontab". The following commands work from the assumption that you are either logged in by Telnet or locally on your Unix system. Entering the command "crontab -l" will display a list of current entries. By default, only entries owned by the logged in User will be displayed. Existing lists can be removed/deleted with command "crontab -r". To create a new list, it is recommended to read the entries from a file using command "crontab filename". The following examples will show you the format of this file. The file itself is created with an ASCII text editor. Example: 0 12 * * * /usr/www/htdocs/yourdomain/cgi-bin/submitpages.cgi cron_parm1.cron.txt This entry consists of six parameters. The first five parameters define the time schedule, whereas the sixth parameter contains the command for executing the job. In our example above, this command consists of: - the full path and file name of the script - an argument This latter argument defines the batch configuration for fantomas shadowMaker(TM). This command can be transferred by "cut and pasted" from the list of "Command line strings" displayed in the lower section of the "BATCH MODE CONFIGURATION EDITOR" of the GUI. Parameters defining the time schedule are: minute(0-59) hour(0-23) day of month(1-31) month(1-12) day of week(0-6) 0 = Sun Hence, the above sample entries: 0 12 * * * can be translated as: If Minute = 0 and Hour = 12, the script will be executed. Because the last three scheduling parms are defined by wildcard character "*", the job will be executed every day. Scheduling Week Days -------------------- If you wish to run the script on Mondays only, the following entry will do the trick: 0 12 * * 1 /usr/www/htdocs/yourdomain/cgi-bin/submitpages.cgi cron_parm1.cron.txt Scheduling Turn of Month ------------------------ You can schedule the turn of the month in this manner: 0 0 1 * * /usr/www/htdocs/yourdomain/cgi-bin/submitpages.cgi cron_parm1.cron.txt Configurations for Multiple Domains ----------------------------------- If you are maintaining multiple domains, you can create a separate job for each domain. E.g. you may create a file named "crontabfilename.txt" and enter the following command lines: 0 12 * * * /usr/www/htdocs/yourdomain/cgi-bin/submitpages.cgi domain1.cron.txt 0 14 * * * /usr/www/htdocs/yourdomain/cgi-bin/submitpages.cgi domain2.cron.txt 0 16 * * * /usr/www/htdocs/yourdomain/cgi-bin/submitpages.cgi domain3.cron.txt The respective argument "domain1.cron.txt" defines the file containing the pertinent domain's batch configuration. Next, the command "crontab crontabfilename.txt" will transmit this file to crontab. IMPORTANT ========= If you have crontab configured for prior jobs already, you must include them in the new file "crontabfilename.txt" (example), as the command "crontab crontabfilename.txt" will override all previous cron jobs owned by the specific user calling crontab. For further explanations under Unix, you can select from the following commands: man crontab man 5 crontab man cron Process Center ============== The script monitor.cgi will list all fantomas shadowMaker(TM) processes currently running on the server system. The table "Process list" displays the following data: 1. Script The running script's name. 2. PID Short for "process identifier", a unique numeric tag assigned by the Linux operating system to each running process. 3. Value Depending on the script displayed, this column will list the respective values for the script(s) currently being processed. - Value for script "geturls.cgi": the KEYWORD currently being processed during the search engine query. - Value for script "getcontent.cgi": the URL currently being spidered. As a rule, every active script will spawn multiple child processes. Hence, the script name may be listed several times in the table. Only one of these child processes will handle a single value at any given time. Thus it is not unusual for the script name to be displayed multiple times, whereas only one value is listed for one of these instances, with the other child processes displaying blank values. Clicking on the button "Reload" will refresh the list of current processes. Killing Processes ----------------- To kill one or several processes, select the corresponding checkbox(es) and hit button "Kill process(es)". When killing processes, it is recommended to select ALL checkboxes assigned to a given script name. This way, you can kill the child processes in tandem with their respective parent process. Note that this procedure may require repeating if a fresh child process happens to have been triggered in the meantime (i. e. since the last display refresh). For security reasons, only processes displaying a PID will be killed. The fantomas spyFetcher(TM) Module: Automatic botBase Maintenance ----------------------------------------------------------------- SYSTEM REQUIREMENTS INSTALLATION - UNIX UNINSTALLING THE PROGRAM WORKING WITH fantomas spyFetcher(TM) CONFIGURATION OF CRON JOBS ERROR HANDLING KNOWN ISSUES UPDATES + PROGRAM CHRONOLOGY CONTACT + SUPPORT ====================================================================== SYSTEM REQUIREMENTS ------------------- Language ------- Perl 5 Module ------ Perl module Wget More info under: < http://www.gnu.org/software/wget/wget.html > UNIX ---- The Unix system requires an installed web server. Execution of CGI scripts must be enabled. A directory for execution of CGI scripts must be existent. Usually, this will be directoy /cgi-bin/. Tested under: SuSE LINUX with Apache Red Hat Linux with Apache BSDI Unix with Apache Browser ------- Script is called and executed via web browser. You will currently achieve best results under MS Internet Explorer 5+. Netscape 4.7 may require adjustment of font size. Tested under: IE5+, Netscape 4.7, Netscape 6.1, Opera 5.12 ====================================================================== INSTALLATION ------------ The following files are included: spyfetcher.cgi --- (program script) locs-sfe_en.txt --- (language file) fantomas.gif --- (logo/graphics file) sfehelp_en.txt --- (documentation in TXT format) <--- THIS FILE YOU ARE READING! fa_license-e.txt --- (License Agreement And Terms Of Usage - PLEASE READ!) ----------------------------------- ADJUSTMENTS IN FILE "spyfetcher.cgi" (please edit in ASCII or plain text editor like Notepad etc.) ----------------------------------- UNIX ---- * Please check path to location of Perl. The default path in the script is "/usr/bin/perl". If you don't know this path, you can check it out under telnet by entering Unix command "whereis perl". You may have to adjust the first line in the script "spyfetcher- e.cgi" accordingly. * The variables in the script "spyfetcher.cgi": "$stats_dir", "$robot_file", "$log_file", "$wget_cmd", "$sendmail", "$from_mail", "$to_mail", "$subject", $cloak_for_google, "$user and "$pw" may optionally be adjusted to your requirements. A comprehensive description of these variables can be found below in chapter "WORKING WITH fantomas spyFetcher(TM)". * The script "spyfetcher.cgi" and the file locs-sfe_en.txt must be copied into the Unix server's CGI directory. * The CGI directory must be endowed with the following permissions: "chmod 755" [drwxr-xr-x] * Next, create the directory defined as variable "$stats_dir" with the following permissions: "chmod 777" [drwxrwxrwx](Default name is "stats".) * Finally, create the following directories with the permissions "chmod 755" [drwxr-xr-x] BELOW your MAIN directory: docs/ graphics/ Uploading the Online Help File ------------------------------- * Copy the following file into the directory "docs/": sfehelp_en.txt Uploading the Logo/Graphics File ------------------------------- * Finally, copy the following file into the directory "graphics/": - fantomas.gif * When uploading via FTP, make sure to transfer ALL files in ASCII mode. EXCEPTION: the graphics file "fantomas.gif" which must be transferred in BINARY or AUTOMATIC mode. * Required file permissions: spyfetcher.cgi: "chmod 755" [-rwxr-xr-x] locs-sfe_en.txt: "chmod 444" [-r--r--r--] fantomas.gif: "chmod 444" [-r--r--r--] sfehelp_en.txt: "chmod 444" [-r--r--r--] ====================================================================== UNINSTALLING THE PROGRAM ------------------------ For complete uninstall, delete the following: spyfetcher.cgi locs-sfe_en.txt fantomas.gif sfehelp_en.txt Also, delete the following directories including all content: stats (or whatever directory you defined under "$stats_dir") docs graphics ====================================================================== WORKING WITH fantomas spyFetcher(TM) ------------------------------------ Program Description ------------------- The fantomas spyFetcher(TM) is a script which allows you to get the latest fantomas spiderSpy(TM) botBase as a packed archive in .ZIP format. The botBase will be unpacked and saved on your server in the directory defined under "$stats_dir" with the file name defined under "$robot_file". --------------------------------- Customization of script variables --------------------------------- The following variables may optionally be customized in script "spyfetcher.cgi": * $stats_dir This variable defines the directory where the spider robots list file shall reside as absolute path in this format: Example: "/usr/www/htdocs/yourdomain/cgi-bin/stats" * $robot_file This variable defines the file name of the spider robots list file. Default file name is "spiderspy.txt". * $log_file This variable defines the file name of the transfer log file. Default file name is "transfer.log". * $wget_cmd This variable defines the command call for wget. Default configuration is "/usr/bin/wget". If you don't know this path, you can check it out under telnet by entering Unix command "whereis wget". Else, please inquire with your system administrator. Email Error Message ------------------- If the script is executed in batch mode via cron job, an email error message will be generated if the transfer of the fantomas spiderSpy(TM) botBase fails. For this email functionality you will need to specify the following variables: * $sendmail This variable defines the command call for the mail program. Default configuration is "/usr/lib/sendmail -t -n -oi". If you don't know this path, you can check it out under telnet by entering Unix command "whereis sendmail". Else, please inquire with your system administrator. * $from_mail This variable defines the email error message sender's address. * $to_mail This variable defines where you want the email error message to be sent. * $subject This variable defines the email error message's subject line. * $cloak_for_google If you want to cloak for Google, please set "$cloak_for_google = 1" and the Google spider entries in the fantomas spiderSpy(TM) botBase will be activated. User Authentication ------------------- After the sign up for the spiderSpy service, you received your user id and password for downloading the fantomas spiderSpy(TM) botBase. * $user This variable defines your user id (case sensitive). * $pw This variable defines your password (case sensitive). ******************** VERY IMPORTANT! ********************* If the variables "$user" and "$pw" are not correct, the download will fail because access is forbidden. SO PLEASE MAKE SURE TO SPECIFY YOUR ID AND PW EXACTLY AS ISSUED DURING SIGNUP! ******************** VERY IMPORTANT! ********************* ONLINE MODE ----------- * Script is activated by entering the appropriate URL into web browser's location/address field, e.g. "http://www.yourdomain.com/cgi-bin/spyfetcher.cgi". To start the download of the current version of the fantomas spiderSpy(TM) botBase, click button "Submit!". If the botBase is saved on your server, the next HTML template will display the message: "Transfer of fantomas spiderSpy(TM) botBase successful!" BATCH MODE ---------- You can manage the transfers of fantomas spiderSpy(TM) botBase automatically by defining a cron job. ====================================================================== CONFIGURATION OF CRON JOBS -------------------------- Cron is a mechanism for planning and scheduling batch jobs. The daemon "crond" is started automatically on system boot up. It runs one check per minute to see if there are any jobs to execute. The list of jobs to execute is created by the program "crontab". The following commands work from the assumption that you are either logged in by Telnet or locally on your Unix system. Entering the command "crontab -l" will display a list of current entries. By default, only entries owned by the logged in User will be displayed. Existing lists can be removed/deleted with command "crontab -r". To create a new list, it is recommended to read the entries from a file using command "crontab filename". The following examples will show you the format of this file. The file itself is created with an ASCII text editor. Example: 0 12 * * * /usr/www/htdocs/yourdomain/cgi-bin/spyfetcher.cgi start This entry consists of six parameters. The first five parameters define the time schedule, whereas the sixth parameter contains the command for executing the job. In our example above, this command consists of: - the full path and file name of the script - an argument Parameters defining the time schedule are: minute(0-59) hour(0-23) day of month(1-31) month(1-12) day of week(0-6) 0 = Sun Hence, the above sample entries: 0 12 * * * can be translated as: If Minute = 0 and Hour = 12, the script will be executed. Because the last three scheduling parms are defined by wildcard character "*", the job will be executed every day. Scheduling Week Days -------------------- If you wish to run the script on Mondays only, the following entry will do the trick: 0 12 * * 1 /usr/www/htdocs/yourdomain/cgi-bin/spyfetcher.cgi start Scheduling Turn of Month ------------------------ You can schedule the turn of the month in this manner: 0 0 1 * * /usr/www/htdocs/yourdomain/cgi-bin/spyfetcher.cgi start To Summarize ------------ Create a text file (e.g. "crontab.txt") and write the appropriate command on one single line. We recommend downloading the fantomas spiderSpy(TM) botBase once per day. The following syntax will generate (as explained above) a cron job which will run once a day: 0 12 * * * /usr/www/htdocs/yourdomain/cgi-bin/spyfetcher.cgi start IMPORTANT ========= Please modify the TIME OF DAY argument specified for your cron job to prevent all downloads happening at the same time - with hundreds of subscribers, this could incur a server overload on our system. Prevention of abuse: Per day, a maximum of six downloads of the botBase are permitted, beyond that the downloading IP will be blocked by our system. Enter the absolute path for the script as valid for *YOUR* specific system configuration. The argument to use is "start", as shown in our example above. Next, the command "crontab crontab.txt" will transmit this file to crontab. IMPORTANT ========= If crontab has been configured for prior jobs already, you must include them in the new file "crontab.txt" (example), as the command "crontab crontab.txt" will override all previous cron jobs owned by the specific user calling crontab! For further online explanations under Unix, you can choose one of the following commands: man crontab man 5 crontab man cron ====================================================================== ERROR HANDLING -------------- This section covers individual error messages. Stats directory --------------- "Stats directory ... does not exist!" Please create stats directory or adjust the directory name under variable "$stats_dir". Download error -------------- "Download of fantomas spiderSpy(TM) botBase failed!" Possible issues: * Call of wget is not functional. Solution: Please check your system's wget functionality. * The access data specified (user id and password) for the botBase are invalid. Solution: Please check your user ID and Password. * In directory stats (defined under $stats_dir) new files could not created. Solution: Please check permissions of directory: "chmod 777" i.e. [drwxrwxrwx] Unzip error ----------- "Unzip of fantomas spiderSpy(TM) botBase failed!" Possible issues: * Call of gunzip is not functional. Solution: Please check your system's gunzip functionality. Change mode error ----------------- "Change mode of fantomas spiderSpy(TM) botBase failed!" Possible issues: * Call of chmod is not functional. Solution: Please check your system's chmod functionality. ====================================================================== KNOWN ISSUES ------------ Graphics -------- Graphics files uploaded to the CGI directory or to a directory below same may not be displayed correctly under some web server configurations. In this case you may create a directory outside of the cgi-bin. You can then define the "$graphics_dir" variable in program file spyfetcher.cgi accordingly. Example: $graphics_dir = "../graphics/"; Docs (Manual/Help files) ------------------------ If the help file is not displayed correctly, we recommend uploading it to an alternate directory (outside of cgi-bin!) as well. You can then define the "$doc_dir" variable in program file spyfetcher.cgi accordingly. Example: $doc_dir = "../docs/"; ====================================================================== ERROR HANDLING -------------- This section covers individual error messages. Initialization of Password -------------------------- "Please create directory: admin/" The password file must be written to file "user.txt" in directory "admin/". For this, the directory must be installed as explained in chapter "Installation". --- "Passwords do not match. No password change effected." If the password entered in field "Re-enter password" differs from the new password specified in the field above, please re-enter the new password. NOTE: Password entries are case sensitive! Changing your password ---------------------- To change your password, you will have to enter your old one first. If you omit to enter the old password, you will receive the message: "Incorrect password!" Should you have forgotten your password, simply delete file "user.txt" and re-initialize password storage. --- "Cannot write to file ...! Please check file permissions: UNIX: "chmod 666" i.e. [-rw-rw-rw-]" The new password will be written to file "user.txt" (default). To effect this, file must have permission "writeable". Opening files ------------- "File ... could not be opened! File does not exist!" Please upload the file to your server. "File ... could not be opened! Please check file permissions: UNIX: "chmod 666" i.e. [-rw-rw-rw-]" Please grant the proper permissions. Saving files ------------ "Please create directory: ...! To save the file, the directory must be installed as explained in chapter "Installation". "File ... could not be created! Please check permissions of ... directory: UNIX: "chmod 777" i.e. [drwxrwxrwx]" To save the file, the directory must be "writeable". "Cannot write to file ...! Please check file permissions: UNIX: "chmod 666" i.e. [-rw-rw-rw-]" The file must have permission "writeable". Deleting files -------------- "File ... could not be flushed. Either file does not exist or file permissions have not been set correctly." Please check file permissions. The file must have permission "writeable". Open directories ---------------- "Directory ... could not be read! Either directory does not exist, or directory permissions have not been set correctly." The directory must have permission "readable". Missing requested values ------------------------ "The following field(s) must be filled in: ..." Please enter the requested value(s). Duplicate keywords ------------------ "Please avoid duplicate keywords: ..." Keywords/search phrases may not be repeated. You may, however, enter different versions, e.g.: online travel online travel booking online travel bookings Submission URLs --------------- "URL ... is not valid!" Please enter a valid URL. Example: http://www.yourshadowdomain.com/page1.html ====================================================================== UPDATES + PROGRAM CHRONOLOGY ============================ 2006-02-12: official release of version 2.02.01 2006-02-02: official release of version 2.01.01 2005-12-10: beta release of version 2.01.01 (non-public) 2002-12-20: official release of version 1.01.01 2002-10-28: beta release of version 1.01.01 ====================================================================== CONTACT + SUPPORT ================= Please send email to: techsupport@fantomaster.com Corporate contact info: For an overview of our business hours in relation to your specific time zone, please see: < http://fantomaster.com/index2.html#hours > ###################################################################### # (c) Copyright 2006 by fantomaster.com # # All rights reserved. # # Copying, modification or distribution requires permission # # in writing by copyright holder. # # fantomas shadowMaker(TM) is the protected trade mark of # # fantomaster.com GmbH. # # URL: < http://fantomaster.com > # # ------------------------------------------------------------------ # # OEM Licensing: # ######################################################################