OpenTSDB-Writing Data

Writing Data

You may want to jump right in and start throwing data into your TSD, but to really take advantage of OpenTSDB‘s power and flexibility, you may want to pause and think about your naming schema. After you‘ve done that, you can procede to pushing data over the Telnet or HTTP APIs, or use an existing tool with OpenTSDB support such as ‘tcollector‘.

你可能调到这里,开始将数据丢进TSD中,但是真正地利用好OpenTSDB的强大功能以及灵活性,你可能需要停一下,想一下你的naming schema。



Naming Schema命名范式

Many metrics administrators are used to supplying a single name for their time series. For example, systems administrators used to RRD-style systems may name their time series webserver01.sys.cpu.0.user. The name tells us that the time series is recording the amount of time in user space for cpu 0 on webserver01. This works great if you want to retrieve just the user time for that cpu core on that particular web server later on.


占用的时间。如果你想获取特定web server上cpu的用户态使用时间的话,这将很好地支持。


But what if the web server has 64 cores and you want to get the average time across all of them? Some systems allow you to specify a wild card such as webserver01.sys.cpu.*.user that would read all 64 files and aggregate the results. Alternatively, you could record a new time series called webserver01.sys.cpu.user.all that represents the same aggregate but you must now write ‘64 + 1‘ different time series. What if you had a thousand web servers and you wanted the average cpu time for all of your servers? You could craft a wild card query like *.sys.cpu.*.user and the system would open all 64,000 files, aggregate the results and return the data. Or you setup a process to pre-aggregate the data and write it to webservers.sys.cpu.user.all.

但是,如果web server有64个核,而你想获取平均时间呢?有些系统允许你使用一个模糊匹配,例如webserver01.sys.cpu.*.user ,然后读取64个文件,然后将它们聚合。


如果你有1000个webserer,对所有的server求cpu平均时间的画?你可能使用*.sys.cpu.*.user ,然后读取64000个文件,然后聚合结果返回数据,或者提前聚合数据,写入新的时间序列如webservers.sys.cpu.user.all。


OpenTSDB handles things a bit differently by introducing the idea of ‘tags‘. Each time series still has a ‘metric‘ name, but it‘s much more generic, something that can be shared by many unique time series. Instead, the uniqueness comes from a combination of tag key/value pairs that allows for flexible queries with very fast aggregations.


唯一性来自于tag,key/value pairs,这样使用查询灵活,也快速进行整合。



Every time series in OpenTSDB must have at least one tag.



Take the previous example where the metric was webserver01.sys.cpu.0.user. In OpenTSDB, this may become sys.cpu.userhost=webserver01, cpu=0. Now if we want the data for an individual core, we can craft a query likesum:sys.cpu.user{host=webserver01,cpu=42}. If we want all of the cores, we simply drop the cpu tag and ask forsum:sys.cpu.user{host=webserver01}. This will give us the aggregated results for all 64 cores. If we want the results for all 1,000 servers, we simply request sum:sys.cpu.user. The underlying data schema will store all of the sys.cpu.user time series next to each other so that aggregating the individual values is very fast and efficient. OpenTSDB was designed to make these aggregate queries as fast as possible since most users start out at a high level, then drill down for detailed information.

回到前面的例子中的metric,webserver01.sys.cpu.0.user。在OpenTSDB中,将变为sys.cpu.userhost=webserver01, cpu=0。








While the tagging system is flexible, some problems can arise if you don‘t understand how the querying side of OpenTSDB, hence the need for some forethought. Take the example query above: sum:sys.cpu.user{host=webserver01}. We recorded 64 unique time series forwebserver01, one time series for each of the CPU cores. When we issued that query, all of the time series for metric sys.cpu.user with the tag host=webserver01 were retrieved, averaged, and returned as one series of numbers. Let‘s say the resulting average was 50 for timestamp 1356998400. Now we were migrating from another system to OpenTSDB and had a process that pre-aggregated all 64 cores so that we could quickly get the average value and simply wrote a new time series sys.cpu.user host=webserver01. If we run the same query, we‘ll get a value of 100 at 1356998400. What happened? OpenTSDB aggregated all 64 time series and the pre-aggregated time series to get to that 100. In storage, we would have something like this:




假设结果平均值为50,时间戳为1356998400。现在我们移到另一个OpenTSDB系统,它有一个进程提前整合64核的数据,这样我们将快速得到平均值,写入一个新的时间序列中sys.cpu.user host=webserver01,但是运行同样的查询,结果却为100。这样是发生什么事情呢?


sys.cpu.user host=webserver01        1356998400  50
sys.cpu.user host=webserver01,cpu=0  1356998400  1
sys.cpu.user host=webserver01,cpu=1  1356998400  0
sys.cpu.user host=webserver01,cpu=2  1356998400  2
sys.cpu.user host=webserver01,cpu=3  1356998400  0
sys.cpu.user host=webserver01,cpu=63 1356998400  1

OpenTSDB will automatically aggregate all of the time series for the metric in a query if no tags are given. If one or more tags are defined, the aggregate will ‘include all‘ time series that match on that tag, regardless of other tags. With the querysum:sys.cpu.user{host=webserver01}, we would include sys.cpu.user host=webserver01,cpu=0 as well as sys.cpu.userhost=webserver01,cpu=0,manufacturer=Intel

sys.cpu.user host=webserver01,foo=bar and 


The moral of this example is: be careful with your naming schema.



sys.cpu.user host=webserver01,cpu=0



sys.cpu.user host=webserver01,foo=bar



这个例子的寓意是:使用naming schema应谨慎







