纪录片《为何讲话》第5期：采集数据（打印版）

My focus and my aim was to capture the phase

我的研究重点以及目标在于

up to the two-word utterance

捕捉逐渐形成"双词"的过程

and that could happen anywhere

这一过程可能出现在

between second and third birthday.

两周岁到三周岁之间的任何时间

It turned out my son was an early talker

事实证明我儿子属于学说话很早的孩子

so by the time his second birthday arrived

所以到他两岁生日的时候

we had the main data set we wanted.

就已获取到了我们所需的主要数据

By the time recording was complete,

录制过程结束的时候

more than 240,000 hours of information

他们收集了超过二十四万小时的信息

and 16 million words had been collected.

以及一千六百万个单词

It's a lot of data but in its raw form it's useless

数据很多可原始数据没什么用

and so the challenges this now sets up for us is

因此现阶段我们面临的挑战是

how do you start extracting the right kind of metadata,

如何着手提取出有用的元数据

transcripts of who said what,

谁说过哪些话的文字记录

annotations of where those people were,

那些人身处何方的注解

annotations of how they're moving

他们如何移动的注解

and the relationships that they were in as they were speaking.

还有讲话人处于怎样的关系之中

And these are the, the tools

这些都是我们正在制作的

that we are now building to analyse the raw data,

用来分析原始数据的工具

and from that, we're starting to see some, some

之后我们就可以开始观察

early insights into the patterns of language development.

语言发展的早期模式了