next up previous
Next: Experiments with Statistical Machine Up: Number of Parallel Sentences Previous: Number of Parallel Sentences

Extracted parallel sentences

We collected about 698,973 parallel sentence with 8,439,907 words in English and 10,367,940 words in Japanese. About 70% of these sentences are simple sentences, about 20% of are complex or compound sentences, and the remaining 10% of these sentences are complex and compound sentence and very long sentence. A lot of these sentences have descriptive text. The amount of dialog text is small. So most of these sentences are not included the travel or tourist domain.

Table 1 shows examples of our collected Japanese-English parallel sentences.

Table 1: Example of Extracted Sentences
She was listless and had a vacant stare.
A star shot across the sky.
I arrived in time in spite of a late start.
He started to say something, then thought better of it.
He stared back the way he had come.
He stared at her glassily.
There are varying definitions of a security association in current standards and this paper attempts to clarify these definitions.
In this paper, Richardson extrapolation is used in conjunction with the finite difference method to solve both one- and two-dimensional electrostatics problems.

Table 2 shows the names of the dictionaries and the type of dictionaries and the number of extracted sentences.

Table 2: Extracted Sentences
Name of Dictionary Type $ \sharp$sentences
AA 機能試験文集[18] D 5,273
AC アンカー和英辞典 A 39,923
AD アンカー英和辞典 A 20,701
AE 学研英和辞典 F 3,826
AF 基本語用例辞典 G 24,000
AI 英文ビジネスレター文例大辞典[10] A 9,355
AJ 外国人のための日本語例文・問題シリーズ F 13,830
AL SENSEVAL 対訳コーパス A 1,096
AM 講談社和英辞典[19] A 40,334
AO 小倉書店 英語文型・文例辞典[12] F 1,330
AQ 研究社 新編英和活用大辞典[20] A 103,064
AR ランダムハウス英語辞典[13] A 39,517
AS ビジネス技術実用英語大辞典[14] B 9,309
AT コンピュータ用語辞典第3版[15] A 3,283
AU 佐良木コーパス A 400
AW 鳥取大学池原研究室 斎藤健太郎コーパス:比較構文 D 143
AX 鳥取大学池原研究室 澤田康子コーパス:因果関係構文 D 334
AY 英語教師用データベース[24] D 758
AZ 研究社 総合ビジネス英語文例事典 A 952
BA 新実用英語ハンドブック[24] A 304
BB 研究社 新和英大辞典[25] A 27,599
BE エクシード英和辞典 G 2,030
BF 科学技術日英・英日コーパス辞典科学技術日英・英日コーパス辞典 B 265
BG 日本語文型辞典 G 3,721
BH 旺文社 マルチ辞書 辞ショック A 58,005
CI 向井京子 英文Eメール文例集 池田書店 [27] C 1,360
CK 読売新聞(文対応データ) [18] E 122,078
CO NHKやさしいビジネス英語 実用フレーズ辞典 C 7,055
CQ 自然科学系和英大辞典 増補改訂新版(小倉書店) A 10,195
CR ジーニアス英和・和英辞典[28] A 5,319
CS 朝日出版社 最新ビジネス英文手紙辞典 CD-ROM版 A 2,232
CT 株式会社アスク 機械を説明する英語 D 2,447
DA IWSLT training D 39,953

next up previous
Next: Experiments with Statistical Machine Up: Number of Parallel Sentences Previous: Number of Parallel Sentences
Jin'ichi Murakami 2007-11-12