TAO: The power of the graphBy Mark

TAO: The power of the graph
By Mark Marchukov on Wednesday, June 26, 2013 at 12:00am
Facebook puts an extremely demanding workload on its data backend. Every time any one of over a billion active users visits Facebook through a desktop browser or on a mobile device, they are presented with hundreds of pieces of information from the social graph. Users see News Feed stories; comments, likes, and shares for those stories; photos and check-ins from their friends -- the list goes on. The high degree of output customization, combined with a high update rate of a typical user’s News Feed, makes it impossible to generate the views presented to users ahead of time. Thus, the data set must be retrieved and rendered on the fly in a few hundred milliseconds.
This challenge is made more difficult because the data set is not easily partitionable, and by the tendency of some items, such as photos of celebrities, to have request rates that can spike significantly. Multiply this by the millions of times per second this kind of highly customized data set must be delivered to users, and you have a constantly changing, read-dominated workload that is incredibly challenging to serve efficiently.
Memcache and MySQL
Facebook has always realized that even the best relational database technology available is a poor match for this challenge unless it is supplemented by a large distributed cache that offloads the persistent store. Memcache has played that role since Mark Zuckerberg installed it on Facebook’s Apache web servers back in 2005. As efficient as MySQL is at managing data on disk, the assumptions built into the InnoDB buffer pool algorithms don’t match the request pattern of serving the social graph. The spatial locality on ordered data sets that a block cache attempts to exploit is not common in Facebook workloads. Instead, what we call creation time locality dominates the workload -- a data item is likely to be accessed if it has been recently created. Another source of mismatch between our workload and the design assumptions of a block cache is the fact that a relatively large percentage of requests are for relations that do not exist -- e.g., “Does this user like that story?” is false for most of the stories in a user’s News Feed. Given the overall lack of spatial locality, pulling several kilobytes of data into a block cache to answer such queries just pollutes the cache and contributes to the lower overall hit rate in the block cache of a persistent store.

The use of memcache vastly improved the memory efficiency of caching the social graph and allowed us to scale in a cost-effective way. However, the code that product engineers had to write for storing and retrieving their data became quite complex. Even though memcache has “cache” in its name, it’s really a general-purpose networked in-memory data store with a key-value data model. It will not automatically fill itself on a cache miss or maintain cache consistency. Product engineers had to work with two data stores and very different data models: a large cluster of MySQL servers for storing data persistently in relational tables, and an equally large collection of memcache servers for storing and serving flat key-value pairs derived (some indirectly) from the results of SQL queries. Even with most of the common chores encapsulated in a data access library, using the memcache-MySQL combination efficiently as a data store required quite a bit of knowledge of system internals on the part of product engineers. Inevitably, some made mistakes that led to bugs, user-visible inconsistencies, and site performance issues. In addition, changing table schemas as products evolved required coordination between engineers and MySQL cluster operators. This slowed down the change-debug-release cycle and didn’t fit well with Facebook's “move fast” development philosophy.
Objects and associations
In 2007, a few Facebook engineers set out to define new data storage abstractions that would fit the needs of all but the most demanding features of the site while hiding most of the complexity of the underlying distributed data store from product engineers. The Objects and Associations API that they created was based on the graph data model and was initially implemented in PHP and ran on Facebook's web servers. It represented data items as nodes (objects), and relationships between them as edges (associations). The API was an immediate success, with several high-profile features, such as likes, pages, and events implemented entirely on objects and associations, with no direct memcache or MySQL calls.
As adoption of the new API grew, several limitations of the client-side implementation became apparent. First, small incremental updates to a list of edges required invalidation of the entire item that stored the list in cache, reducing hit rate. Second, requests operating on a list of edges had to always transfer the entire list from memcache servers over to the web servers, even if the final result contained only a few edges or was empty. This wasted network bandwidth and CPU cycles. Third, cache consistency was difficult to maintain. Finally, avoiding thundering herds in a purely client-side implementation required a form of distributed coordination that was not available for memcache-backed data at the time.
All those problems could be solved directly by writing a custom distributed service designed around objects and associations. In early 2009, a team of Facebook infrastructure engineers started to work on TAO (“The Associations and Objects”). TAO has now been in production for several years. It runs on a large collection of geographically distributed server clusters. TAO serves thousands of data types and handles over a billion read requests and millions of write requests every second. Before we take a look at its design, let’s quickly go over the graph data model and the API that TAO implements.

The use of memcache vastly improved the memory efficiency of caching the social graph and allowed us to scale in a cost-effective way. However, the code that product engineers had to write for storing and retrieving their data became quite complex. Even though memcache has “cache” in its name, it’s really a general-purpose networked in-memory data store with a key-value data model. It will not automatically fill itself on a cache miss or maintain cache consistency. Product engineers had to work with two data stores and very different data models: a large cluster of MySQL servers for storing data persistently in relational tables, and an equally large collection of memcache servers for storing and serving flat key-value pairs derived (some indirectly) from the results of SQL queries. Even with most of the common chores encapsulated in a data access library, using the memcache-MySQL combination efficiently as a data store required quite a bit of knowledge of system internals on the part of product engineers. Inevitably, some made mistakes that led to bugs, user-visible inconsistencies, and site performance issues. In addition, changing table schemas as products evolved required coordination between engineers and MySQL cluster operators. This slowed down the change-debug-release cycle and didn’t fit well with Facebook's “move fast” development philosophy. 
Objects and associations
In 2007, a few Facebook engineers set out to define new data storage abstractions that would fit the needs of all but the most demanding features of the site while hiding most of the complexity of the underlying distributed data store from product engineers. The Objects and Associations API that they created was based on the graph data model and was initially implemented in PHP and ran on Facebook's web servers. It represented data items as nodes (objects), and relationships between them as edges (associations). The API was an immediate success, with several high-profile features, such as likes, pages, and events implemented entirely on objects and associations, with no direct memcache or MySQL calls.
As adoption of the new API grew, several limitations of the client-side implementation became apparent. First, small incremental updates to a list of edges required invalidation of the entire item that stored the list in cache, reducing hit rate. Second, requests operating on a list of edges had to always transfer the entire list from memcache servers over to the web servers, even if the final result contained only a few edges or was empty. This wasted network bandwidth and CPU cycles. Third, cache consistency was difficult to maintain. Finally, avoiding thundering herds in a purely client-side implementation required a form of distributed coordination that was not available for memcache-backed data at the time. 
All those problems could be solved directly by writing a custom distributed service designed around objects and associations. In early 2009, a team of Facebook infrastructure engineers started to work on TAO (“The Associations and Objects”). TAO has now been in production for several years. It runs on a large collection of geographically distributed server clusters. TAO serves thousands of data types and handles over a billion read requests and millions of write requests every second. Before we take a look at its design, let’s quickly go over the graph data model and the API that TAO implements.

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (อังกฤษ) 1: [สำเนา]

คัดลอก!

TAO: The power of the graphBy Mark Marchukov on Wednesday, June 26, 2013 at 12:00amFacebook puts an extremely demanding workload on its data backend. Every time any one of over a billion active users visits Facebook through a desktop browser or on a mobile device, they are presented with hundreds of pieces of information from the social graph. Users see News Feed stories; comments, likes, and shares for those stories; photos and check-ins from their friends -- the list goes on. The high degree of output customization, combined with a high update rate of a typical user's News Feed, makes it impossible to generate the views presented to users ahead of time. Thus, the data set must be retrieved and rendered on the fly in a few hundred milliseconds.This challenge is made more difficult because the data set is not easily partitionable, and by the tendency of some items, such as photos of celebrities, to have request rates that can spike significantly. Multiply this by the millions of times per second this kind of highly customized data set must be delivered to users, and you have a constantly changing, read-dominated workload that is incredibly challenging to serve efficiently.Memcache and MySQLFacebook has always realized that even the best relational database technology available is a poor match for this challenge unless it is supplemented by a large distributed cache that offloads the persistent store. Memcache has played that role since Mark Zuckerberg installed it on Facebook's Apache web servers back in 2005. As efficient as MySQL is at managing data on disk, the assumptions built into the InnoDB buffer pool algorithms don't match the request pattern of serving the social graph. The spatial locality on ordered data sets that a block cache attempts to exploit is not common in Facebook workloads. Instead, what we call creation time locality dominates the workload -- a data item is likely to be accessed if it has been recently created. Another source of mismatch between our workload and the design assumptions of a block cache is the fact that a relatively large percentage of requests are for relations that do not exist -- e.g., "Does this user like that story?" is false for most of the stories in a user's News Feed. Given the overall lack of spatial locality, pulling several kilobytes of data into a block cache to answer such queries just pollutes the cache and contributes to the lower overall hit rate in the block cache of a persistent store.The use of memcache vastly improved the memory efficiency of caching the social graph and allowed us to scale in a cost-effective way. However, the code that product engineers had to write for storing and retrieving their data became quite complex. Even though memcache has "cache" in its name, it's really a general-purpose networked in-memory data store with a key-value data model. It will not automatically fill itself on a cache miss or maintain cache consistency. Product engineers had to work with two data stores and very different data models: a large cluster of MySQL servers for storing data persistently in relational tables, and an equally large collection of memcache servers for storing and serving flat key-value pairs derived (some indirectly) from the results of SQL queries. Even with most of the common chores encapsulated in a data access library, using the memcache-MySQL combination efficiently as a data store required quite a bit of knowledge of system internals on the part of product engineers. Inevitably, some made mistakes that led to bugs, user-visible inconsistencies, and site performance issues. In addition, changing table schemas as products evolved required coordination between engineers and MySQL cluster operators. This slowed down the change-debug-release cycle and didn't fit well with Facebook's "move fast" development philosophy. Objects and associationsIn 2007, a few Facebook engineers set out to define new data storage abstractions that would fit the needs of all but the most demanding features of the site while hiding most of the complexity of the underlying distributed data store from product engineers. The Objects and Associations API that they created was based on the graph data model and was initially implemented in PHP and ran on Facebook's web servers. It represented data items as nodes (objects), and relationships between them as edges (associations). The API was an immediate success, with several high-profile features, such as likes, pages, and events implemented entirely on objects and associations, with no direct memcache or MySQL calls.As adoption of the new API grew, several limitations of the client-side implementation became apparent. First, small incremental updates to a list of edges required invalidation of the entire item that stored the list in cache, reducing hit rate. Second, requests operating on a list of edges had to always transfer the entire list from memcache servers over to the web servers, even if the final result contained only a few edges or was empty. This wasted network bandwidth and CPU cycles. Third, cache consistency was difficult to maintain. Finally, avoiding thundering herds in a purely client-side implementation required a form of distributed coordination that was not available for memcache-backed data at the time. All those problems could be solved directly by writing a custom distributed service designed around objects and associations. In early 2009, a team of Facebook infrastructure engineers started to work on TAO ("The Associations and Objects"). TAO has now been in production for several years. It runs on a large collection of geographically distributed server clusters. TAO serves thousands of data types and handles over a billion read requests and millions of write requests every second. Before we take a look at its design, let's quickly go over the graph data model and the API that TAO implements.

การแปล กรุณารอสักครู่..

ผลลัพธ์ (อังกฤษ) 2:[สำเนา]

คัดลอก!

TAO: The Power of the graph
By Mark Marchukov on Wednesday, June 26, 2013 at twelve o'clock am
an extremely Facebook puts on its workload Demanding Data backend. Every time any one of over a billion active users visits Facebook through a desktop browser or on a mobile device, they are presented with hundreds of pieces of information from the social graph. Users see News Feed stories; comments, likes, and shares for those stories; photos and check-ins from their friends - the list goes on. The high degree of output customization, combined with a high update rate of a typical user's News Feed, makes it impossible to generate the views presented to users ahead of time. Thus, the Data SET must be Retrieved and Rendered on the Fly in a few Hundred milliseconds.
This Challenge is Made more difficult because the Data SET is not easily partitionable, and by the tendency of Some items, such as photos of Celebrities, to have. request rates that can spike significantly. Multiply this by the Millions of times per Second this Kind of highly customized Data SET must be delivered to Users, and You have a constantly changing, read-dominated workload that is incredibly challenging to serve efficiently.
Memcache and MySQL
Facebook has always realized that even. the best relational database technology available is a poor match for this challenge unless it is supplemented by a large distributed cache that offloads the persistent store. Memcache has played that role since Mark Zuckerberg installed it on Facebook's Apache web servers back in 2005. As efficient as MySQL is at managing data on disk, the assumptions built into the InnoDB buffer pool algorithms do not match the request pattern of serving the social. graph. The spatial locality on ordered data sets that a block cache attempts to exploit is not common in Facebook workloads. Instead, what we call creation time locality dominates the workload - a data item is likely to be accessed if it has been recently created. Another source of mismatch between our workload and the design assumptions of a block cache is the fact that a relatively large percentage of requests are for relations that do not exist - eg, "Does this user like that story?" Is false for most of. the stories in a user's News Feed. Given the overall Lack of Spatial locality, Pulling several kilobytes of Data Into a Block cache to answer such queries just pollutes the cache and contributes to the Lower overall hit rate in the Block cache of a Persistent Store. The use of memcache vastly improved the memory. efficiency of caching the social graph and allowed us to scale in a cost-effective way. However, the code that product engineers had to write for storing and retrieving their data became quite complex. Even though memcache has "cache" in its name, it's really a general-purpose networked in-memory data store with a key-value data model. It will not automatically fill itself on a cache miss or maintain cache consistency. Product engineers had to work with two data stores and very different data models: a large cluster of MySQL servers for storing data persistently in relational tables, and an equally large collection of memcache servers for storing and serving flat key-value pairs derived (some indirectly. ) from the results of SQL queries. Even with most of the common chores encapsulated in a data access library, using the memcache-MySQL combination efficiently as a data store required quite a bit of knowledge of system internals on the part of product engineers. Inevitably, some made mistakes that led to bugs, user-visible inconsistencies, and site performance issues. In addition, changing table schemas as products evolved required coordination between engineers and MySQL cluster operators. This slowed down the Change-Debug-release Cycle and did not Fit well with Facebook's "Move fast" Development Philosophy. Objects and Associations In 2,007th, a few Facebook engineers SET out to define New Data Storage abstractions that would Fit the Needs of all. but the most demanding features of the site while hiding most of the complexity of the underlying distributed data store from product engineers. The Objects and Associations API that they created was based on the graph data model and was initially implemented in PHP and ran on Facebook's web servers. It represented data items as nodes (objects), and relationships between them as edges (associations). The API was an immediate Success, with several High-Profile features, such as likes, pages, and events implemented entirely on Objects and Associations, with no Direct memcache or MySQL calls. As Adoption of the New API grew, several limitations of the Client. -side implementation became apparent. First, small incremental updates to a list of edges required invalidation of the entire item that stored the list in cache, reducing hit rate. Second, requests operating on a list of edges had to always transfer the entire list from memcache servers over to the web servers, even if the final result contained only a few edges or was empty. This wasted network bandwidth and CPU cycles. Third, cache consistency was difficult to maintain. Finally, avoiding Thundering herds in a purely Client-Side Distributed implementation required a form of coordination that was not available for memcache-backed Data at the time. All those Solved Problems could be designed directly by Writing a Custom Service Around Distributed Objects and Associations. In early 2009, a team of Facebook infrastructure engineers started to work on TAO ("The Associations and Objects"). TAO has now been in production for several years. It runs on a large collection of geographically distributed server clusters. TAO serves thousands of data types and handles over a billion read requests and millions of write requests every second. Before we take a look at its design, let's quickly go over the graph data model and the API that TAO implements.

การแปล กรุณารอสักครู่..

ผลลัพธ์ (อังกฤษ) 3:[สำเนา]

คัดลอก!

TAO: The power of the graph
By Mark Marchukov Wednesday on, 26 June, at 2013 12: 00am
Facebook puts an extremely demanding. Workload on its data backend. Every time any one of over a billion active users visits Facebook through a desktop browser. Or on a mobile device they are, presented with hundreds of pieces of information from the social graph. Users see News Feed. Stories; comments likes,,And shares for those stories; photos and check-ins from their friends - the list goes on. The high degree of, output customization. Combined with a high update rate of a typical user ', s News Feed makes it impossible to generate the views presented to users. Ahead of time. Thus the data, set must be retrieved and rendered on the fly in a few hundred milliseconds.
.This challenge is made more difficult because the data set is not, easily partitionable and by the tendency of, some items. Such as photos, of celebrities to have request rates that can spike significantly. Multiply this by the millions of times. Per second this kind of highly customized data set must be delivered to users and you, have a, constantly changingRead-dominated workload that is incredibly challenging to serve efficiently.

Memcache and MySQL Facebook has always realized. That even the best relational database technology available is a poor match for this challenge unless it is supplemented. By a large distributed cache that offloads the persistent store.Memcache has played that role since Mark Zuckerberg installed it on Facebook 's Apache web servers back in 2005. As efficient. As MySQL is at managing data on disk the assumptions, built into the InnoDB buffer pool algorithms don t match the request. ' Pattern of serving the social graph.The spatial locality on ordered data sets that a block cache attempts to exploit is not common in Facebook workloads, Instead,. What we call creation time locality dominates the workload - a data item is likely to be accessed if it has been recently. Created.Another source of mismatch between our workload and the design assumptions of a block cache is the fact that a relatively. Large percentage of requests are for relations that do not exist - e.g. "Does, this user like that story?" is false for. Most of the stories in a user 's News Feed. Given the overall lack of, spatial localityPulling several kilobytes of data into a block cache to answer such queries just pollutes the cache and contributes to. The lower overall hit rate in the block cache of a persistent store.

The use of Memcache vastly improved the memory efficiency. Of caching the social graph and allowed us to scale in a cost-effective, However way.The code that product engineers had to write for storing and retrieving their data became quite complex. Even though Memcache. Has "cache." in its name it ', s really a general-purpose networked in-memory data store with a key-value data model. It will. Not automatically fill itself on a cache miss or maintain cache consistency.Product engineers had to work with two data stores and very different data models: a large cluster of MySQL servers for. Storing data persistently in relational tables and an, equally large collection of Memcache servers for storing and serving. Flat key-value pairs derived (some indirectly) from the results of SQL queries. Even with most of the common chores encapsulated. In a data, access libraryUsing the memcache-MySQL combination efficiently as a data store required quite a bit of knowledge of system internals. On the part of product engineers. Inevitably some made, mistakes that led to bugs user-visible inconsistencies and site,,, Performance issues. In addition changing table, schemas as products evolved required coordination between engineers and. MySQL cluster operators.This slowed down the change-debug-release cycle and didn 't fit well with Facebook' s "move fast." development, philosophy.

In, Objects and associations 2007A few Facebook engineers set out to define new data storage abstractions that would fit the needs of all but the most demanding. Features of the site while hiding most of the complexity of the underlying distributed data store from product engineers.The Objects and Associations API that they created was based on the graph data model and was initially implemented in PHP. And ran on Facebook 's web servers. It represented data items as nodes (objects), and relationships between them as edges. (associations). The API was an, immediate success with several high-profile features such as likes pages,,,And events implemented entirely on objects and associations with no, direct Memcache or MySQL calls.
As adoption of the. New, API grew several limitations of the client-side implementation became apparent. First small incremental, updates to. A list of edges required invalidation of the entire item that stored the list in cache reducing hit rate. Second,,Requests operating on a list of edges had to always transfer the entire list from Memcache servers over to the, web servers. Even if the final result contained only a few edges or was empty. This wasted network bandwidth and CPU cycles. Third cache,, Consistency was difficult to, Finally maintain.Avoiding thundering herds in a purely client-side implementation required a form of distributed coordination that was not. Available for memcache-backed data at the time.
All those problems could be solved directly by writing a custom distributed. Service designed around objects and associations In, early 2009.A team of Facebook infrastructure engineers started to work on TAO ("The Associations and Objects."). TAO has now been in. Production for several years. It runs on a large collection of geographically distributed server clusters. TAO serves thousands. Of data types and handles over a billion read requests and millions of write requests every second. Before we take a look. At, its designLet 's quickly go over the graph data model and the API that TAO implements.
.

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.