Internet archive, WayBackMachine, Alexa Internet and HTTP archive

Last update : October 11, 2014

The Internet Archive (archive.org) is a non-profit digital library with the stated mission of universal access to all knowledge. The Internet Archive is a member of the International Internet Preservation Consortium (IIPC) and the American Library Association (ALA).

The most known service of the Internet Archive is the WayBackMachine that allows archives of the World Wide Web to be searched and accessed. You can browse through over 150 billion web pages archived from 1996 to a few months ago.

Brewster Kahle founded the Archive in 1996 at the same time that he began the for-profit web crawling company Alexa Internet. The company’s name was chosen in homage to the Library of Alexandria, the largest and most significant library of the ancient world. In 1999, Alexa was acquired by Amazon.com. Alexa ranks sites based on tracking information of users, the database served as the basis for the creation of the WayBackMachine and Alexa continues to supply the Internet Archive with Web crawls.

Alexa provides also the data for the HTTP archive created in 2010 by Steve Souders. The HTTP archive provides records how the digitized content of webpages is constructed and served. It is a permanent repository of web performance information such as size of pages, failed requests, and technologies utilized.

Other projects of the Internet Archive are listed below :

  • Open-Library : catalog of 23 million books, text of about 1,6 million public domain books
  • Education Resources Library : hundreds of free courses, video lectures, and supplemental materials from universities
  • Archive-it : web archiving service that allows institutions and individuals to build and preserve collections of digital content
  • NASA images : more than 100.000 items of NASA’s image, video, and audio collections
  • Audio Collection : over 100,000 concert recordings from independent artists and other selcetd audio files
  • Text Collection : digitized books from various libraries around the world as well as many special collections
  • Software Archive : access to all kinds of rare or difficult to find, legally downloadable software
  • Moving Image Collection : thousands of free movies, films, and videos
  • TV News : more than 366,000 broadcasts

SEO marketing and SERPs

Last update : June 29, 2013
SEO (Search engine optimization) is the process of improving the visibility of a website or a web page in a search engine’s natural search results (natural = un-paid, organic, algorithmic). A SERP (search engine results page) is the listing of results returned by a search engine in response to a keyword query.

For WordPress, the leading content management system for blogs, there are number of performant plugins that make it easy to optimize your posts.

The deliberate manipulation of search engine indexes is called spamdexing. Common spamdexing techniques can be classified into two broad classes : content spam and link spam. See the related post for informations about pagerank, content farms, search quality and black hat SEO.

More informations about SEO and related topics are available at the following links :

eyePlorer : the knowledge machine

Last update : August 9, 2013

eyePlorer by Vionto

eyePlorer by Vionto

eyePlorer is (or was) a graphical knowledge engine created by vionto®. Current search engines only present lists of links and documents, with eyePlorer however, you are able to locate relevant information and connections instantly. Facts and relationships between terms and concepts are visualised in an interactive application. The knowledge machines build by vionto® employ sophisticated semantic techniques in order to analyse the meaning of sentences and texts. The benefit for the user is that he or she can work with individual facts instead of just long documents.

The user does not work with documents but with knowledge and facts in a graphical, interactive, almost dialogue-like kind of way. Knowledge is visually arranged in different categories. vionto® knowledge machines are based on semantic analyses derived from cognitive science, brain research and computational linguistics. vionto® relies on a robust language technology platform and sophisticated linguistic resources such as, for example, ontologies and thesauri. Currently eyePlorer processes the English and German Wikipedia as well as MEDLINE/PubMed.

In the circular area on the left hand side eyePlorer presents eyeSpots – these represent terms that are related to a search topic. The exact nature of these connections can be displayed by pointing or clicking on an eyeSpot. A small window, the eyeTip, opens and displays facts that document the relation with one or more facts taken from our knowledge base. The circular area filled with eyeSpots that relate to a certain search term is called an eyeMap. To display connections between eyeSpots just double-click on an eyeSpot – lines will appear between the eyeSpot you clicked upon and eyeSpots that are semantically related. A click on one of these lines will display associated facts taken from the knowledge base.

EyeSpots are associated with various categories (people, countries, organizations, time, society, work, science & technology, …) visualized with different colors. The categorisation is a procedure that is carried out automatically. A double click on an empty area of a category expands it and shows only this category along with all its subcategories.

A dynamic link to eyePlorer can be added to a website to visualize search terms.

vionto® filed for U.S. patent registration of the eyePlorer technology. The eyePlorer visualizes knowledge graphs (k-graphs) derived from various contents that can be interactively explored.

vionto GmbH was founded in december 2008 in Berlin by Ralf von Grafenstein (Diplom-Kaufmann) and Dr. Martin C. Hirsch (neurobiologist and brain researcher). The first version of eyePlorer went online in February 2009 (see Frankfurter Allgemeine Feuilleton : Google-Herausforderer eyePlorer – Die Welt ist doch eine Scheibe) . Several prestigious prizes of the  internet sector have been awarded to vionto® (SUMA award, ECO Intenet award, Red Herring Europe Top 100 2009 Award, …)

However one year later was the end of the prestigious project with the inglorious death by bankruptcy of vionto GmbH (see SpeedX Blog : Verglüht). It was the same destiny as its ancestor semgine GmbH. The successor seems to be medx GmbH (diagnostic reasoning), the url eyeplorer.com is redirected to this site.

Google Custom Search

Google Search

Google offers a custom search engine (beta version) to webmasters to create a local search tool on a website or a blog. A quick and easy way consist in integrating a javascript code provided by Google on your webpage. The search engine can be customized to include more sites or to adapt the style of the results pages to the style of the website. Google provides tutorials, FAQ’s, developer documentation and featured examples to help webmasters to design the search tool.

Google’s “Terms of Use” state that you may not in any way frame, cache or modify the Results produced by the Google search engine. The results pages include advertisements placed by Google. For enterprises wanting ad-free results pages, Google offers various price plans for the Google site search.

A solution used in the past by several developers was based on javascript code to open a small search window for doing a local search on a website. An example for searching the saraproft.lu website is given below:

=======================================

<p><a href=”javascript:(function()

{ p=prompt(‘Entrez un texte pour faire une recherche dans le site saraproft.lu via Google Luxembourg.’,”);

if(p)

{ document.location.href=’http://www.google.lu/search?

q=site:saraproft.lu ‘+escape(p)} })();” >

Search</a></p>

=======================================

This solution has some disadvantages with the security mecanisms of the new browser generation and is no longer recommended.

World Wide Web, World Live Web, World Life Web

In 1994, in the wake of Tim Berners Lee‘s work, the World Wide Web was officially born. A global web, wide in its dimensions as in its contents. Over the years, these contents have literally exploded, imposing the use of search engines to try and sort out this fertile chaos on the basis of the principle of a classification ‘by relevance’. The domain name (DNS) to identify and classify web sites and to adress documents and the  “http protocol” (hypertext transfer protocol) to retrieve them are the main features of this first documentary age of the web.

Then came the World Live Web, an instantaneous subset of the World Wide Web, a web giving the latest published information in real time. Google News service was one of the pioneers of this second documentary age, but it also enables to refer to what is called micro contents (citizen media), e.g. comments on blogs. Specialised search engines like Technorati are integrated with tools that power the blogosphere and are able to index new content within ten minutes. According to Technorati data, there are over 175,000 new blogs every day. In april 2008, Technorati is tracking more than 100 million blogs and over 250 million pieces of tagged social media. For instance searching for artgallery.lu in Technorati gives more than 100 results.

We are now entering a third documentary age, the World Life Web, in particular with the extraordinary boom of social networks (Facebook, MySpace) and of virtual worlds  (Second Life). The main issues of this new age are the sociability and the indexable and remixable nature of our digital identity as well as its traces on the network.

Olivier Ertzscheid, enseignant-chercheur (Maître de Conférences) en Sciences de l’information et de la communication au département Infocom de l’IUT de la Roche sur Yon (Université de Nantes) a publié un petit texte à vocation pédagogique sur ce sujet sur son blog personnel affordance.info.

Blogs, Blogrolls, Blogosphere, RSS newsfeeds, Permalinks

A weblog, or “blog”, is a personal journal on the Web that is updated frequently, most often displaying its material in journal-like chronological dated entries or posts.  Weblogs cover as many different topics, and express as many opinions, as there are people writing them. Weblogs are different from traditional media. Bloggers (someone who writes a blog ) tend to be more opinionated, niche-focused, and partisan than journalists, who strive for editorial objectivity. Many weblogs allow readers to write a reaction (comment) to what was written in the blog entry. A blogroll is a list of blogs and bloggers that any particular blog author finds influential or interesting. The online community of bloggers, their writings and the comments is called Blogosphere.

Weblogs usually offer RSS feeds (a file format that allows anyone with a website to easily “syndicate” their content)  to make part of their content (excerpts and links back to the originating website) available to other sites to use and publish the informations. Excerpts are optional hand-crafted summaries of the content. To provide an easy way to capture specific references to posts or articles in a blog, permalinks (a permanent identifier to a specific weblog post or article) are the preferred solution. Inbound links refer to hyperlinks from other sources citing that weblog. Outbound links refer to hyperlinks from the weblog to outside sources. The leading monitor of the world of weblogs is Technorati, a real-time search engine that is the largest source of fresh information about the global and local conversations going on all across the Web.

Open Directory Project

The Open Directory Project is the largest, most comprehensive human-edited directory of the Web. It is constructed and maintained by a vast, global community of volunteer editors. The web continues to grow at staggering rates. Automated search engines are increasingly unable to turn up useful results to search queries. The Open Directory provides the means for the Internet to organize itself. As the Internet grows, so do the number of net-citizens. These citizens can each organize a small portion of the web and present it back to the rest of the population, culling out the bad and useless and keeping only the best content.

The Open Directory was founded in the spirit of the Open Source movement, and is the only major directory that is 100% free. There is not, nor will there ever be, a cost to submit a site to the directory, and/or to use the directory’s data. The Open Directory data is made available for free to anyone who agrees to comply to the free use license.