Twitter Search is Now 3x Faster


In the spring of 2010, the search team at Twitter started to rewrite our search engine in order to serve our ever-growing traffic, improve the end-user latency and availability of our service, and enable rapid development of new search features. As part of the effort, we launched a new real-time search engine, changing our back-end from MySQL to a real-time version of Lucene. Last week, we launched a replacement for our Ruby-on-Rails front-end: a Java server we call Blender. We are pleased to announce that this change has produced a 3x drop in search latencies and will enable us to rapidly iterate on search features in the coming months.

PERFORMANCE GAINS

Twitter search is one of the most heavily-trafficked search engines in the world, serving over one billion queries per day. The week before we deployed Blender, the #tsunami in Japan contributed to a significant increase in query load and a related spike in search latencies. Following the launch of Blender, our 95th percentile latencies were reduced by 3x from 800ms to 250ms and CPU load on our front-end servers was cut in half. We now have the capacity to serve 10x the number of requests per machine. This means we can support the same number of requests with fewer servers, reducing our front-end service costs.

95th Percentile Search API Latencies Before and After Blender Launch

TWITTER’S IMPROVED SEARCH ARCHITECTURE

In order to understand the performance gains, you must first understand the inefficiencies of our former Ruby-on-Rails front-end servers. The front ends ran a fixed number of single-threaded rails worker processes, each of which did the following:

  • parsed queries

  • queried index servers synchronously

  • aggregated and rendered results

We have long known that the model of synchronous request processing uses our CPUs inefficiently. Over time, we had also accrued significant technical debt in our Ruby code base, making it hard to add features and improve the reliability of our search engine. Blender addresses these issues by:

  1. Creating a fully asynchronous aggregation service. No thread waits on network I/O to complete.

  2. Aggregating results from back-end services, for example, the real-time, top tweet, and geo indices.

  3. Elegantly dealing with dependencies between services. Workflows automatically handle transitive dependencies between back-end services.

The following diagram shows the architecture of Twitter’s search engine. Queries from the website, API, or internal clients at Twitter are issued to Blender via a hardware load balancer. Blender parses the query and then issues it to back-end services, using workflows to handle dependencies between the services. Finally, results from the services are merged and rendered in the appropriate language for the client.

Twitter Search Architecture with Blender

BLENDER OVERVIEW

Blender is a Thrift and HTTP service built on Netty, a highly-scalable NIO client server library written in Java that enables the development of a variety of protocol servers and clients quickly and easily. We chose Netty over some of its other competitors, like Mina and Jetty, because it has a cleaner API, better documentation and, more importantly, because several other projects at Twitter are using this framework. To make Netty work with Thrift, we wrote a simple Thrift codec that decodes the incoming Thrift request from Netty’s channel buffer, when it is read from the socket and encodes the outgoing Thrift response, when it is written to the socket.

Netty defines a key abstraction, called a Channel, to encapsulate a connection to a network socket that provides an interface to do a set of I/O operations like read, write, connect, and bind. All channel I/O operations are asynchronous in nature. This means any I/O call returns immediately with a ChannelFuture instance that notifies whether the requested I/O operations succeed, fail, or are canceled.

When a Netty server accepts a new connection, it creates a new channel pipeline to process it. A channel pipeline is nothing but a sequence of channel handlers that implements the business logic needed to process the request. In the next section, we show how Blender maps these pipelines to query processing workflows.

WORKFLOW FRAMEWORK

In Blender, a workflow is a set of back-end services with dependencies between them, which must be processed to serve an incoming request. Blender automatically resolves dependencies between services, for example, if service A depends on service B, A is queried first and its results are passed to B. It is convenient to represent workflows as directed acyclic graphs (see below).

Sample Blender Workflow with 6 Back-end Services

In the sample workflow above, we have 6 services {s1, s2, s3, s4, s5, s6} with dependencies between them. The directed edge from s3 to s1 means that s3 must be called before calling s1 because s1 needs the results from s3. Given such a workflow, the Blender framework performs a topological sort on the DAG to determine the total ordering of services, which is the order in which they must be called. The execution order of the above workflow would be {(s3, s4), (s1, s5, s6), (s2)}. This means s3 and s4 can be called in parallel in the first batch, and once their responses are returned, s1, s5, and s6 can be called in parallel in the next batch, before finally calling s2.

Once Blender determines the execution order of a workflow, it is mapped to a Netty pipeline. This pipeline is a sequence of handlers that the request needs to pass through for processing.

MULTIPLEXING INCOMING REQUESTS

Because workflows are mapped to Netty pipelines in Blender, we needed to route incoming client requests to the appropriate pipeline. For this, we built a proxy layer that multiplexes and routes client requests to pipelines as follows:

  • When a remote Thrift client opens a persistent connection to Blender, the proxy layer creates a map of local clients, one for each of the local workflow servers. Note that all local workflow servers are running inside Blender’s JVM process and are instantiated when the Blender process starts.

  • When the request arrives at the socket, the proxy layer reads it, figures out which workflow is requested, and routes it to the appropriate workflow server.

  • Similarly, when the response arrives from the local workflow server, the proxy reads it and writes the response back to the remote client.

We made use of Netty’s event-driven model to accomplish all the above tasks asynchronously so that no thread waits on I/O.

DISPATCHING BACK-END REQUESTS

Once the query arrives at a workflow pipeline, it passes through the sequence of service handlers as defined by the workflow. Each service handler constructs the appropriate back-end request for that query and issues it to the remote server. For example, the real-time service handler constructs a realtime search request and issues it to one or more realtime index servers asynchronously. We are using the twitter commons library (recently open-sourced!) to provide connection-pool management, load-balancing, and dead host detection.

The I/O thread that is processing the query is freed when all the back-end requests have been dispatched. A timer thread checks every few milliseconds to see if any of the back-end responses have returned from remote servers and sets a flag indicating if the request succeeded, timed out, or failed. We maintain one object over the lifetime of the search query to manage this type of data.

Successful responses are aggregated and passed to the next batch of service handlers in the workflow pipeline. When all responses from the first batch have arrived, the second batch of asynchronous requests are made. This process is repeated until we have completed the workflow or the workflow’s timeout has elapsed.

As you can see, throughout the execution of a workflow, no thread busy-waits on I/O. This allows us to efficiently use the CPU on our Blender machines and handle a large number of concurrent requests. We also save on latency as we can execute most requests to back-end services in parallel.

BLENDER DEPLOYMENT AND FUTURE WORK

To ensure a high quality of service while introducing Blender into our system, we are using the old Ruby on Rails front-end servers as proxies for routing thrift requests to our Blender cluster. Using the old front-end servers as proxies allows us to provide a consistent user experience while making significant changes to the underlying technology. In the next phase of our deploy, we will eliminate Ruby on Rails entirely from the search stack, connecting users directly to Blender and potentially reducing latencies even further.

 

Get started with Personal Web Pages


Introduction

Personal Web Pages let you share your life with the world. Find out what features are available, how to check disk space usage, what the Rules of the Road are and more.

All about Personal Web Pages

Start a blog, link to your favorite web sites, and post hundreds and hundreds of pictures, videos and documents. You can also upload and download files anywhere in the world you have Internet access.

You can do all this and more when you create your own website with Personal Web Pages. Free to any of our subscribers, your Personal Web Page comes with the latest features and 1GB (gigabyte) of online storage space.

Features included with Personal Web Pages:

  • Personal Page Name
    Provide your website with a personalized, easy-to-remember URL (web address). You have two choices:

  • Website Builder
    Build an entire website quickly and easily. Add the site features you want, choose your color scheme, and much more!

  • Address List
    Among other things, you can easily create a list of members for your community, and set who’ll receive copies of your e-newsletters.

  • Calendar
    Create and manage a monthly calendar that can be viewed by visitors. It’s a great way to let people know what’s happening at your site throughout the year.

  • Counter
    Keep track of how many visitors your site receives.

  • FrontPage Extensions
    Add support for Microsoft FrontPage Server Extensions to your site. Use Microsoft FrontPage to upload and manage your site, and add the ability to use FrontPage 2002 Extensions such as the hit counter, site search, discussion webs, and form support.

  • Guestbook
    Receive feedback from visitors to your site and help identify people with similar interests. Visitors can sign-in, leave comments, and even view what others have said about your site.

  • Mailform
    Visitors can easily contact you via email on your Web site.

  • Newsletter
    Communicate with members of similar interests: gather articles, edit them, and then send a completed newsletter to your subscribers.

  • Polling Booth
    Poll visitors and provide up-to-the-minute results, right on your website. It’s a fast and simple way to find out what others are thinking.

Disk space usage

There is so much you can do and with 1GB of free online storage space, you can do it all!  We also make it easy to keep track of your disk space usage.

To check disk space usage:

The more disk memory you use, the higher the percentage number will increase and the further the green bar will move to the right. If you need to free up space, don’t delete online files. Instead, download files to your hard drive.

Personal Web Pages’ Rules of the Road

Before you get started, take a moment to review our Rules of the Road:

  • Personal Web Pages allow any XFINITY Internet Service member to create and publish a 25MB web page per Email account, accessible via the Internet.

  • Your use of this feature of XFINITY Internet Service is governed by the same Terms and Conditions as your use of the rest of the Service, including the policies governing acceptable use and conduct. You can click on Terms Of Service at any time to review them.

  • You are solely and totally responsible for the content on your web pages and your use of this feature. We will not systematically review that content, but we have the discretion to block access to your pages, remove content on your pages, or suspend or terminate your account if the content violates the Terms and Conditions or Acceptable Use Policy.

  • Copyrighted material is the property of the owner and the owner can take you to court if you make a digital copy (say, on your personal home page) without permission. Third parties can similarly own and legally restrict your copying of trademarks, photographs, and other intellectual property. So, please, stick to images and other content that you know you can use without infringing someone else’s rights to that content. When in doubt, seek the permission of the owner.

  • If you wish to include your email address on your page, please be aware that web crawlers can visit your page, pull the email address and proceed to use it in mass emailings or spam that you may not want.

  • Keep a copy of your web page on either your hard drive or on a disk for any possible future changes/revisions.

If you’re subject to a Service plan that permits business use of the Service, you may include business information on your web pages, within the 1GB limit, subject to the following: Business Information Includes:

  • Information about a for-profit activity (yours or someone else’s) if you stand to gain financially, either directly or indirectly, from publishing that information. For example, a purely personal recommendation about your favorite product or place is not commercial information. Information about a product or service that you sell would be.

  • A link to any information about a for-profit activity if you stand to gain financially, directly, or indirectly from that link. The link can be a banner ad, an icon, or text.

Rules for Business Information:

  • Comcast does not endorse or guarantee any product or service mentioned on your web pages and you may not state or imply any such endorsement or guarantee.

  • The XFINITY Internet Service Subscriber Agreement and policies (also sometimes known as the Terms and Conditions) continue to apply to any business information you put on your pages and to your use of the XFINITY Internet Service to create, edit, and publish business information. You are solely responsible for the content on your pages, including any commercial information. Comcast does not assume any responsibility for any loss, damage, or cost due to your use or inability to use the Service, including any inability of others to view your pages. Please refer to the Terms of Service for details.

  • YOU AGREE TO DEFEND, INDEMNIFY, AND HOLD HARMLESS COMCAST AND THIRD PARTIES WHO CONTRIBUTE TO THE SERVICE FROM ANY LOSS, DAMAGE, OR COST (INCLUDING ATTORNEYS’ FEES) RESULTING FROM YOUR VIOLATION OF THESE RULES, ANY ACTIVITY RELATED TO YOUR WEB PAGES, OR ANY PRODUCT, SERVICE, OR INFORMATION DESCRIBED OR INCLUDED ON THE PAGES.

Social Media Marketing Services


Social Media Marketing Services – The Necessary Evil for Businesses Across the Globe

There was a time, not too long ago, where business was considered to be an ecosystem. Now, it has become nothing short of a full-fledged battleground. As if the ‘sales wars’ were not enough, yet another evil has plagued the pristine world of customer satisfaction, and it is popularly known as ‘social media’. The popularity of social media and networking websites has increased so much that companies who do not avail SMM services is looked at with disdain as obsolete and ancient. Digital marketing is a boon and a blessing in disguise, make no mistake about that. However, the exponential manner in which it has grown and the desperation with which businesses want to join the social media bandwagon just to ride the tide is alarming, to say the least.

Article Source: http://EzineArticles.com/8456335

SEO Strategy Supports Your Brand


7 Tips to Ensure Your SEO Strategy Supports Your Brand

branding-iron

SEO isn’t just about ranking for keywords. Many people fall into a keyword obsession rut and seem to forget that while keywords are important, SEO at its core is about indexation, crawlability, and creating a site that is effectively traversed by crawling search bots.

Many people also often forget how, when done effectively, SEO supports their brand. How your brand is displayed in search, as well as the many other online properties where you have a presence, is commonly forgotten in the race toward powerful rankings for desired non-branded keywords. Don’t get me wrong, I love to see non-branded organic visibility rise, but we can’t forget “The Brand.”

Read More….

Title Tags & SEO: 3 Golden Rules


title-tags-first-golden-rule

It used to be conventional wisdom that title tags should be between 65 and 70 characters in length. Early this year Google began experimenting with a new search layout design that reduces the number of characters shown to lengths between 48 and 62 characters.

The title tag remains an important part of SEO for one basic reason – it is the overall label for the content of a page, and because the number of characters is limited,there isn’t much room to do that much with it.

Read more……

Got SEO Basics? 5 Tips To Boost Your Organic CTR


Google_Desktop_CTR1-600x365

There’s no shortage of new digital marketing channels these days — and while they are innovative, exciting, and fun to experiment with, they can also distract you from SEO basics that can deliver performance gains. Your organic click-through rate (CTR) is a great example.

A simple page title tweak could take you from driving 2% of consumers to your landing page, to driving 20%! But today, your organic CTR can be affected by several different elements. Below we’ll take a look at five factors in your control:

Read More……