Technical Report Series on Corpus Building

Storlek: px
Starta visningen från sidan:

Download "Technical Report Series on Corpus Building"

Transkript

1 Technical Report Series on Corpus Building Vol. 9 (June 2013) Swedish Corpora Uwe Quasthoff Dirk Goldhahn Abteilung Automatische Sprachverarbeitung, Institut für Informatik, Universität Leipzig

2 Affiliation of the authors: Uwe Quasthoff, Dirk Goldhahn: Institut für Informatik,Universität Leipzig {quasthoff, Copyright: Abteilung Automatische Sprachverarbeitung, Institut für Informatik, Universität Leipzig, Technical Report Series on Corpus Building Vol. 1: Deutscher Wortschatz 2013 Vol. 2: Danish Corpora Vol. 3: Dutch Corpora Vol. 4: Icelandic Corpora Vol. 5: Hungarian Corpora Vol. 6: Ukrainian Corpora Vol. 7: Indonesian Corpora Vol. 8: Czech Corpora Vol. 9: Swedish Corpora This PDF document was created using the open source tool mwlib. For more infotmation, see PDF generated at: 26. June 2013

3 Swedish corpora 1 Introduction to corpus creation 1 SWE - a processing related language description 2 SWE corpora 3 SWE corpus comparison 8 Processing details 10 Appendix to swe news 2007: Database summary 10 Appendix to swe news 2008: Database summary 10 Appendix to swe news 2009: Database summary 11 Appendix to swe news 2010: Database summary 11 Appendix to swe news 2011: Database summary 12 Appendix to swe news 2012: Database summary 12 Appendix to swe newscrawl 2011: Database summary 13 Appendix to swe newscrawl 2012: Database summary 13 Appendix to swe web 2002: Database summary 14 Appendix to swe web 2011: Database summary 14 Appendix to swe web 2012: Database summary 15 Appendix to swe wikipedia 2007: Database summary 15 Appendix to swe wikipedia 2012: Database summary 16 Appendix to swe mixed 2012: Database summary 16 Content details 17 Appendix to swe news 2007: Size of different TLDs 17 Appendix to swe news 2008: Size of different TLDs 17 Appendix to swe news 2009: Size of different TLDs 18 Appendix to swe news 2010: Size of different TLDs 18 Appendix to swe news 2011: Size of different TLDs 18 Appendix to swe news 2012: Size of different TLDs 19 Appendix to swe newscrawl 2011: Size of different TLDs 19 Appendix to swe newscrawl 2012: Size of different TLDs 20 Appendix to swe web 2002: Size of different TLDs 20 Appendix to swe web 2011: Size of different TLDs 20

4 Appendix to swe web 2012: Size of different TLDs 21 Appendix to swe mixed 2012: Size of different TLDs 21 Appendix to swe news 2007: Size of largest domains 22 Appendix to swe news 2008: Size of largest domains 22 Appendix to swe news 2009: Size of largest domains 23 Appendix to swe news 2010: Size of largest domains 24 Appendix to swe news 2011: Size of largest domains 24 Appendix to swe news 2012: Size of largest domains 25 Appendix to swe newscrawl 2011: Size of largest domains 26 Appendix to swe newscrawl 2012: Size of largest domains 26 Appendix to swe web 2002: Size of largest domains 27 Appendix to swe web 2011: Size of largest domains 28 Appendix to swe web 2012: Size of largest domains 28 Appendix to swe mixed 2012: Size of largest domains 29 Appendix to swe news 2007: Number of sources by time period 30 Appendix to swe news 2008: Number of sources by time period 31 Appendix to swe news 2009: Number of sources by time period 33 Appendix to swe news 2010: Number of sources by time period 34 Appendix to swe news 2011: Number of sources by time period 35 Appendix to swe news 2012: Number of sources by time period 37 Word details 39 Appendix to swe news 2007: Words by length without multiplicity 39 Appendix to swe news 2008: Words by length without multiplicity 41 Appendix to swe news 2009: Words by length without multiplicity 43 Appendix to swe news 2010: Words by length without multiplicity 45 Appendix to swe news 2011: Words by length without multiplicity 47 Appendix to swe news 2012: Words by length without multiplicity 49 Appendix to swe newscrawl 2011: Words by length without multiplicity 51 Appendix to swe newscrawl 2012: Words by length without multiplicity 53 Appendix to swe web 2002: Words by length without multiplicity 55 Appendix to swe web 2011: Words by length without multiplicity 57 Appendix to swe web 2012: Words by length without multiplicity 59 Appendix to swe wikipedia 2012: Words by length without multiplicity 61 Appendix to swe mixed 2012: Words by length without multiplicity 63 Appendix to swe news 2007: Words by length with multiplicity 65 Appendix to swe news 2008: Words by length with multiplicity 67 Appendix to swe news 2009: Words by length with multiplicity 69

5 Appendix to swe news 2010: Words by length with multiplicity 71 Appendix to swe news 2011: Words by length with multiplicity 73 Appendix to swe news 2012: Words by length with multiplicity 75 Appendix to swe newscrawl 2011: Words by length with multiplicity 77 Appendix to swe newscrawl 2012: Words by length with multiplicity 79 Appendix to swe web 2002: Words by length with multiplicity 81 Appendix to swe web 2011: Words by length with multiplicity 83 Appendix to swe web 2012: Words by length with multiplicity 85 Appendix to swe wikipedia 2007: Words by length with multiplicity 87 Appendix to swe wikipedia 2012: Words by length with multiplicity 89 Appendix to swe mixed 2012: Words by length with multiplicity 91 Appendix to swe news 2007: The most frequent 50 words 92 Appendix to swe news 2008: The most frequent 50 words 93 Appendix to swe news 2009: The most frequent 50 words 94 Appendix to swe news 2010: The most frequent 50 words 95 Appendix to swe news 2011: The most frequent 50 words 96 Appendix to swe news 2012: The most frequent 50 words 97 Appendix to swe newscrawl 2011: The most frequent 50 words 98 Appendix to swe newscrawl 2012: The most frequent 50 words 99 Appendix to swe web 2002: The most frequent 50 words 100 Appendix to swe web 2011: The most frequent 50 words 101 Appendix to swe web 2012: The most frequent 50 words 102 Appendix to swe wikipedia 2007: The most frequent 50 words 103 Appendix to swe wikipedia 2012: The most frequent 50 words 104 Appendix to swe mixed 2012: The most frequent 50 words 105 Appendix to swe news 2007: Longest words in top by rank 106 Appendix to swe news 2008: Longest words in top by rank 107 Appendix to swe news 2009: Longest words in top by rank 108 Appendix to swe news 2010: Longest words in top by rank 109 Appendix to swe news 2011: Longest words in top by rank 110 Appendix to swe news 2012: Longest words in top by rank 111 Appendix to swe newscrawl 2011: Longest words in top by rank 112 Appendix to swe newscrawl 2012: Longest words in top by rank 113 Appendix to swe web 2002: Longest words in top by rank 114 Appendix to swe web 2011: Longest words in top by rank 115 Appendix to swe web 2012: Longest words in top by rank 116 Appendix to swe wikipedia 2007: Longest words in top by rank 117 Appendix to swe wikipedia 2012: Longest words in top by rank 118

6 Appendix to swe mixed 2012: Longest words in top by rank 119 Character N-gram details 120 Appendix to swe news 2007: Alphabet as used in the top words 120 Appendix to swe news 2008: Alphabet as used in the top words 121 Appendix to swe news 2009: Alphabet as used in the top words 122 Appendix to swe news 2010: Alphabet as used in the top words 123 Appendix to swe news 2011: Alphabet as used in the top words 125 Appendix to swe news 2012: Alphabet as used in the top words 126 Appendix to swe newscrawl 2011: Alphabet as used in the top words 127 Appendix to swe newscrawl 2012: Alphabet as used in the top words 128 Appendix to swe web 2002: Alphabet as used in the top words 129 Appendix to swe web 2011: Alphabet as used in the top words 131 Appendix to swe web 2012: Alphabet as used in the top words 132 Appendix to swe wikipedia 2007: Alphabet as used in the top words 133 Appendix to swe wikipedia 2012: Alphabet as used in the top words 134 Appendix to swe mixed 2012: Alphabet as used in the top words 136 Abbreviation details 138 Appendix to swe news 2007: Most frequent abbreviations 138 Appendix to swe news 2008: Most frequent abbreviations 139 Appendix to swe news 2009: Most frequent abbreviations 140 Appendix to swe news 2010: Most frequent abbreviations 141 Appendix to swe news 2011: Most frequent abbreviations 142 Appendix to swe news 2012: Most frequent abbreviations 143 Appendix to swe newscrawl 2011: Most frequent abbreviations 143 Appendix to swe newscrawl 2012: Most frequent abbreviations 144 Appendix to swe web 2002: Most frequent abbreviations 144 Appendix to swe web 2011: Most frequent abbreviations 145 Appendix to swe web 2012: Most frequent abbreviations 145 Appendix to swe wikipedia 2007: Most frequent abbreviations 146 Appendix to swe wikipedia 2012: Most frequent abbreviations 147 Appendix to swe mixed 2012: Most frequent abbreviations 148 Appendix to swe news 2007: Left neighbors of the full stop 148 Appendix to swe news 2008: Left neighbors of the full stop 149 Appendix to swe news 2009: Left neighbors of the full stop 150 Appendix to swe news 2010: Left neighbors of the full stop 151 Appendix to swe news 2011: Left neighbors of the full stop 152

7 Appendix to swe news 2012: Left neighbors of the full stop 153 Appendix to swe newscrawl 2011: Left neighbors of the full stop 154 Appendix to swe newscrawl 2012: Left neighbors of the full stop 155 Appendix to swe web 2002: Left neighbors of the full stop 156 Appendix to swe web 2011: Left neighbors of the full stop 157 Appendix to swe web 2012: Left neighbors of the full stop 158 Appendix to swe wikipedia 2007: Left neighbors of the full stop 159 Appendix to swe wikipedia 2012: Left neighbors of the full stop 160 Appendix to swe mixed 2012: Left neighbors of the full stop 161 Appendix to swe news 2007: Left neighbors of the full stop with additional internal full stops 162 Appendix to swe news 2008: Left neighbors of the full stop with additional internal full stops 163 Appendix to swe news 2009: Left neighbors of the full stop with additional internal full stops 164 Appendix to swe news 2010: Left neighbors of the full stop with additional internal full stops 165 Appendix to swe news 2011: Left neighbors of the full stop with additional internal full stops 166 Appendix to swe news 2012: Left neighbors of the full stop with additional internal full stops 167 Appendix to swe newscrawl 2011: Left neighbors of the full stop with additional internal full stops 168 Appendix to swe newscrawl 2012: Left neighbors of the full stop with additional internal full stops 169 Appendix to swe web 2002: Left neighbors of the full stop with additional internal full stops 170 Appendix to swe web 2011: Left neighbors of the full stop with additional internal full stops 171 Appendix to swe web 2012: Left neighbors of the full stop with additional internal full stops 172 Appendix to swe wikipedia 2007: Left neighbors of the full stop with additional internal full stops 173 Appendix to swe wikipedia 2012: Left neighbors of the full stop with additional internal full stops 174 Appendix to swe mixed 2012: Left neighbors of the full stop with additional internal full stops 175 Sentences details 176 Appendix to swe news 2007: Shortest sentences 176 Appendix to swe news 2008: Shortest sentences 177 Appendix to swe news 2009: Shortest sentences 179 Appendix to swe news 2010: Shortest sentences 180 Appendix to swe news 2011: Shortest sentences 182 Appendix to swe news 2012: Shortest sentences 183 Appendix to swe newscrawl 2011: Shortest sentences 185 Appendix to swe newscrawl 2012: Shortest sentences 186 Appendix to swe web 2002: Shortest sentences 188 Appendix to swe web 2011: Shortest sentences 189 Appendix to swe web 2012: Shortest sentences 191 Appendix to swe wikipedia 2007: Shortest sentences 192

8 Appendix to swe wikipedia 2012: Shortest sentences 194 Appendix to swe mixed 2012: Shortest sentences 195 Appendix to swe news 2007: Longest sentences 197 Appendix to swe news 2008: Longest sentences 199 Appendix to swe news 2009: Longest sentences 201 Appendix to swe news 2010: Longest sentences 203 Appendix to swe news 2011: Longest sentences 205 Appendix to swe news 2012: Longest sentences 207 Appendix to swe newscrawl 2011: Longest sentences 209 Appendix to swe newscrawl 2012: Longest sentences 211 Appendix to swe web 2002: Longest sentences 213 Appendix to swe web 2011: Longest sentences 215 Appendix to swe web 2012: Longest sentences 217 Appendix to swe wikipedia 2007: Longest sentences 219 Appendix to swe wikipedia 2012: Longest sentences 221 Appendix to swe mixed 2012: Longest sentences 223 Appendix to swe news 2007: Length of sentences in characters 225 Appendix to swe news 2008: Length of sentences in characters 226 Appendix to swe news 2009: Length of sentences in characters 227 Appendix to swe news 2010: Length of sentences in characters 228 Appendix to swe news 2011: Length of sentences in characters 229 Appendix to swe news 2012: Length of sentences in characters 230 Appendix to swe newscrawl 2011: Length of sentences in characters 231 Appendix to swe newscrawl 2012: Length of sentences in characters 232 Appendix to swe web 2002: Length of sentences in characters 233 Appendix to swe web 2011: Length of sentences in characters 234 Appendix to swe web 2012: Length of sentences in characters 235 Appendix to swe wikipedia 2007: Length of sentences in characters 236 Appendix to swe wikipedia 2012: Length of sentences in characters 237 Appendix to swe mixed 2012: Length of sentences in characters 238 Appendix to swe news 2007: Length of sentences in words 239 Appendix to swe news 2008: Length of sentences in words 240 Appendix to swe news 2009: Length of sentences in words 241 Appendix to swe news 2010: Length of sentences in words 242 Appendix to swe news 2011: Length of sentences in words 243 Appendix to swe news 2012: Length of sentences in words 244 Appendix to swe newscrawl 2011: Length of sentences in words 245 Appendix to swe newscrawl 2012: Length of sentences in words 246

9 Appendix to swe web 2002: Length of sentences in words 247 Appendix to swe web 2011: Length of sentences in words 248 Appendix to swe web 2012: Length of sentences in words 249 Appendix to swe wikipedia 2007: Length of sentences in words 250 Appendix to swe wikipedia 2012: Length of sentences in words 251 Appendix to swe mixed 2012: Length of sentences in words 252 Oddities details 253 Appendix to swe news 2007: Longest words 253 Appendix to swe news 2008: Longest words 253 Appendix to swe news 2009: Longest words 254 Appendix to swe news 2010: Longest words 254 Appendix to swe news 2011: Longest words 255 Appendix to swe news 2012: Longest words 255 Appendix to swe newscrawl 2011: Longest words 256 Appendix to swe newscrawl 2012: Longest words 256 Appendix to swe web 2002: Longest words 257 Appendix to swe web 2011: Longest words 257 Appendix to swe web 2012: Longest words 258 Appendix to swe wikipedia 2007: Longest words 258 Appendix to swe wikipedia 2012: Longest words 259 Appendix to swe mixed 2012: Longest words 259 Appendix to swe news 2007: Sentences with high average word length 260 Appendix to swe news 2008: Sentences with high average word length 261 Appendix to swe news 2009: Sentences with high average word length 262 Appendix to swe news 2010: Sentences with high average word length 263 Appendix to swe news 2011: Sentences with high average word length 264 Appendix to swe news 2012: Sentences with high average word length 265 Appendix to swe newscrawl 2011: Sentences with high average word length 266 Appendix to swe newscrawl 2012: Sentences with high average word length 267 Appendix to swe web 2002: Sentences with high average word length 268 Appendix to swe web 2011: Sentences with high average word length 269 Appendix to swe web 2012: Sentences with high average word length 270 Appendix to swe wikipedia 2007: Sentences with high average word length 271 Appendix to swe wikipedia 2012: Sentences with high average word length 272 Appendix to swe mixed 2012: Sentences with high average word length 273 Appendix to swe news 2007: Problems with sentence segmentation - words ending in a stopword 274 Appendix to swe news 2008: Problems with sentence segmentation - words ending in a stopword 275

10 Appendix to swe news 2009: Problems with sentence segmentation - words ending in a stopword 275 Appendix to swe news 2010: Problems with sentence segmentation - words ending in a stopword 276 Appendix to swe news 2011: Problems with sentence segmentation - words ending in a stopword 277 Appendix to swe news 2012: Problems with sentence segmentation - words ending in a stopword 278 Appendix to swe newscrawl 2011: Problems with sentence segmentation - words ending in a stopword 278 Appendix to swe newscrawl 2012: Problems with sentence segmentation - words ending in a stopword 279 Appendix to swe web 2002: Problems with sentence segmentation - words ending in a stopword 280 Appendix to swe web 2011: Problems with sentence segmentation - words ending in a stopword 281 Appendix to swe web 2012: Problems with sentence segmentation - words ending in a stopword 282 Appendix to swe wikipedia 2007: Problems with sentence segmentation - words ending in a stopword 283 Appendix to swe wikipedia 2012: Problems with sentence segmentation - words ending in a stopword 283 Appendix to swe mixed 2012: Problems with sentence segmentation - words ending in a stopword 284

11 1 Swedish corpora Introduction to corpus creation The Leipzig Corpora Collection (LCC) collects Web based corpora for many different languages. The main text genres are newspaper texts, Wikipedias and randomly collected web pages. All corpora are processed in the same way: Crawling Web pages HTML stripping Language identifikation Sentence segmentation Cleaning: Removal of ill-formed sentences Duplicate removal Calculation of word frequences and word co-occurrences As result we have a corpus containing only well-formed sentences in the language under consideration. The sentences are in random order; hence, sharing the corpus does not violate copyright law because it is impossible to reconstruct the original texts. The pre-processing steps contain both language independent steps (like HTML stripping and duplicate removal) and language dependent steps (like language identification and sentence segmentation). Especially the language specific parts are vulnerable to specific processing problems. The aim of the paper is to identify possible problems and evaluate the results. The following problems are adressed: A processing-focused language description Language size: How much text is available for this language? What are the biggest sources? Corpus description: Genre, size, crawling and processing date. Possible problems in language identification: Which languages are similar? Character set and alphabet Inspecting the word list: Most frequent words, longer high frequent words and longest words at all. Word length distribution. Can abbreviations confuse sentence segmentation? Information about the abbreviation list. Inspecting sentences: Inspect shortest and longest sentences to identify possible segmentation problems. Sentence length distribution. The paper describes the result of these inspections; the appendices show the exact results for the different corpora. This helps to compare the corpora with respect to quality. In the section quality overview, an overall quality description for each corpus is given. All corpora contain only minor problems which are irrelevant for most applications. Otherwise the corpus creation has been iterated.

12 SWE - a processing related language description 2 SWE - a processing related language description General properties of the Swedish language Native Name: Svenska Classifiation: Indo-European, Germanic, North, East Scandinavian, Danish-Swedish, Swedish Total Number of Speakers: 8.4M Largest countries with number of speakers: Sweden(8.0M). Also spoken in parts of Finland, where it has equal legal standing with Finnish. Largely mutually intelligible with Norwegian and Danish. Source: / www. ethnologue. com/ language/ swe Processing summary Latin alphabet with some additional characters full stop is used as sentence boundary and for abbreviations apostrostophes used rarely Properties important for processing Alphabet and punctuation The alphabet is Latin based, with the following specialities (source: / en. wikipedia. org/ wiki/ Swedish_alphabet): Swedish includes all 26 base letters and Å, Ä, Ö. In the alphabetic ordering, the letters Å, Ä, Ö follow Z at the end of the alphabet. Usual Latin punctuation Usage of uppercase letters: At sentence beginnings and for proper names (of persons, organisations, countries etc.). Sentence segmentation and word tokenization Sentence beginnings Sentences begin with a capitalized first word. Abbreviations Abbreviations confusing with sentence boundaries: Special abbreviation list has to be inspected. Sources for abbreviations:??? Abbreviations with full stop may appear in the word list without full stop. Apostrophes: The use of apostrophes is infrequent.

13 SWE - a processing related language description 3 Sources and ranking (2012) Estimated number of webpages containing text Google.com top-5 words: 337,000,000 results for "och" "i" "att" "som" "på" Google.com top-10 words: 232,000,000 results for "och" "i" "att" "som" "på" "är" "en" "av" "för" "med" Rank according to number of speakers (Ethnologue): 86 Rank according to Wikipedia size (see / de. wikipedia. org/ wiki/ Wikipedia:Sprachen): Rank 5 with articles ( ). Rank according to number of newspapers as found by AbyZ (5/2012): 256 newspapers, rank 10. Rank according to number of newspapers with RSS feeds (5/2012): 122 newspapers, rank 13. Rank according to our corpus size (9/2012): 13 SWE corpora Quality Overview Quality Ratings A: Very good quality. Ready to use (or already used) for frequency dictionary. Size as large as possible Only minimal errors Multiple genres (if possible) A-: Small problems identified. They should not affect usage. B: Native speaker quality. Information about abbreviations and sentence boundaries by native speaker Resulting statistics checked by native speaker, possible errors corrected C: Non-native speaker quality Obvious problems shown in corpus statistics are corrected D: First version Pre-processing with default abbreviation list and default sentence boundaries E: Poor Quality: Old, outdated or faulty. Corpus Quality The quality of the corpora differes slightly because the corpus processing toolchain changed slightly during several years. Moreover, original data are often no more available. Hence, improvement of quality often means removing incomplete or doubtful sentences. Forthcoming editions of all corpora thus might have a slightly smaller number of sentences. This especially applies to near duplicate sentences which are removed only sparingly. The following table shows the quality of the corpora. Minimal errors are still possible and described in the sections below. All possible major improvements are mentioned here.

14 SWE corpora 4 Corpus Quality rating Known problems to-dos swe_news_2007 A - - swe_news_2008 A - - swe_news_2009 A- Some uplicate sentences - swe_news_2010 A - - swe_news_2011 A - - swe_news_2012 A - - swe_newscrawl_2011 A- several near duplicate peaks - swe_newscrawl_2012 A - - swe_web_2002 A- max. 255 bytes instead characters - swe_web_2011 A - - swe_web_2012 A - - swe_wikipedia_2007 A- max. 255 bytes instead characters - swe_wikipedia_2012 A - - swe_mixed_2012 A - - Processing Overview For more details, see Appendix: Database Summary and Appendix: Number of sources by time period. Corpus Size (M sentences) Size (M running words) Multiwords Crawling date Production date swe_news_ mainly 2005 and swe_news_ daily 2008, 17% without date 2011 swe_news_ daily swe_news_ daily swe_news_ daily swe_news_ daily swe_newscrawl_ / swe_newscrawl_ / swe_web_ batch crawl swe_web_ / / swe_web_ / / swe_wikipedia_ / swe_wikipedia_ / swe_mixed_ see above 2013

15 SWE corpora 5 Content Overview For more details, see Appendix: Size of different TLDs and Appendix: Size of different domains. Corpus Type of sources Countries Number of sources Publishing date Biggest source swe_news_2007 News.se (93%),.fi(3%),.com(2%) 113 mainly 5/ / swe_news_2008 News.se swe_news_2009 News.se swe_news_2010 News.se swe_news_2011 News.se swe_news_2012 News.se(95%),.ax(5%) swe_newscrawl_2011 News.se(80%),.com(18%) and before swe_newscrawl_2012 News.se(82%),.fi(8%),.com(7%),.nu(2%) and before swe_web_2002 Web.se and before swe_web_2011 Web.se(88%),.com(5%),.fi(4%) and before swe_web_2012 Web.se(86%),.com(7%),.fi(3%) and before swe_wikipedia_2007 Wikipedia and before wikipedia.org swe_wikipedia_2012 Wikipedia and before wikipedia.org swe_mixed_2012 Mixed Sources.se(80%),.com(6%),.fi(3%) and before Words Appendix: Words by Length without multiplicity shows a plot of the corresponding length distribution. A smooth asymetric bell-shaped curve is expected. Appendix: Words by Length with multiplicity shows a plot of the corresponding length distribution. A smooth asymetric bell-shaped curve is expected. Appendix: The Most Frequent 50 Words shows the most frequent stopwords as well as one or more words related to the region. Appendix: Longest Words in Top-1000 by rank shows the 25 longest words within the top The usually give an impression of the main topics treated in the corpus. Appendix: Longest Words with minimum frequency 2 should give an idea of very long words. In the case of processing problems, different types of non-words may appear. This might help to improve the word definition.

16 SWE corpora 6 Corpus Word length graph without multiplicity Word length graph with multiplicity Most Frequent 50 Words Longest Words in Top-1000 Longest Words with minimum frequency 2 swe_news_2007 okay okay okay okay URLs, missing blanks swe_news_2008 okay okay okay okay missing blanks swe_news_2009 okay okay okay okay missing blanks, routes swe_news_2010 okay okay okay okay missing blanks, junk swe_news_2011 okay okay okay Rank 636: 71000@aftonbladet.se URLs, missing blanks swe_news_2012 okay okay okay okay okay swe_newscrawl_2011 okay okay okay okay Missing blanks, routes, junk, URLs swe_newscrawl_2012 okay okay okay okay URLs, missing blanks, junk, etc. swe_web_2002 okay okay okay okay URLs, missing blanks, chemicals swe_web_2011 okay okay okay okay Routes, URLs, missing blanks, junk swe_web_2012 okay okay okay okay Routes, missing blanks, URLs, junk swe_wikipedia_2007 okay okay okay Rank 971: RobotQuistnix Routes, URLs swe_wikipedia_2012 okay okay okay okay URLs swe_mixed_2012 okay okay okay okay all of the above Abbreviations Abbreviations are usually not used as sentence boundaries. Conversely, missing abbreviations can overgenerate sentence boundaries. Due to limitations in the processing chain, the list of abbreviations used for sentence boundary detection can differ from the abbreviations in the word list. Appendix: Most Frequent Abbreviations shows possible under-generation of sentence boundaries by wrong abbreviations (i.e. words ending in a full stop) in the word list. Sentences Appendix: Shortest sentences shows the shortest declarative, exclamatory and interrogative sentences. In preprocessing, a minimal length for sentences might be specified. And missing abbreviations are often visible as faulty sentence engings. Appendix: Longest sentences shows the longest declarative, exclamatory and interrogative sentences. Usually, the maximun sentence length is defined as 256 characters (not 256 bytes). Very long exclamatory or interrogative sentences often contain an overseen sentence boundary. Appendix: Length of sentences in characters shows the distribution of the sentence length. A large and balanced corpus will result in a smooth and bell-shaped curve. Isolated local maxima usually result from large sets of near duplicate sentences.

17 SWE corpora 7 Corpus Shortest sentences Longest sentences Length distribution (in characters) Length distribution (in words) swe_news_2007 okay max. 255 bytes instead characters okay okay swe_news_2008 okay okay okay okay swe_news_2009 Some uplicate sentences okay okay okay swe_news_2010 okay okay okay okay swe_news_2011 okay okay okay okay swe_news_2012 okay okay okay okay swe_newscrawl_2011 okay okay several near duplicate peaks okay swe_newscrawl_2012 okay okay okay okay swe_web_2002 okay okay max. 255 bytes instead characters okay swe_web_2011 okay okay okay okay swe_web_2012 okay okay okay okay swe_wikipedia_2007 okay okay max. 255 bytes instead characters okay swe_wikipedia_2012 okay okay okay okay swe_mixed_2012 okay okay okay okay Oddities Appendix: Sentences with high average word length: Average sentences contain many stopwords, and these stopwords are usually short. Hence, they restrict the average word length in a sentence. Conversely, sentences with high average word length are often ill formed. They may be used to improve pre-processing. Appendix: Problems with sentence segmentation - Words ending in a stopword: If there are many ill-formed word or sentence boundaries witout a blank between two words, they will generate new ill-formed words. The appendix shows the most frequent words ending in an uppercase stopword. If they are infrequent then the date were of high quality. Corpus Sentences with high average word length Words ending in a stopword swe_news_2007 missing blanks maxfreq=48 swe_news_2008 routes, proper names okay, maxfreq=8 swe_news_2009 okay okay, maxfreq=11 swe_news_2010 URLs, missing blanks, routes maxfreq=19 swe_news_2011 okay maxfreq=17 swe_news_2012 okay okay, maxfreq=4 swe_newscrawl_2011 URLs, missing blanks, junk maxfreq=203 swe_newscrawl_2012 missing blanks, junk maxfreq=94 swe_web_2002 URLs, junk, special characters maxfreq=33 swe_web_2011 URLs, missing blanks, routes, junk maxfreq=32 swe_web_2012 URLs, missing blanks, routes, junk okay, maxfreq=12 swe_wikipedia_2007 URLs, chemicals, routes okay

18 SWE corpora 8 swe_wikipedia_2012 URLs, Japanese, routes okay swe_mixed_2012 as above maxfreq=203 SWE corpus comparison Automated Corpus comparison For the following comparisons, the following tests on the top-1000 words are performed: Vectors based on the frequencies of the top-1000 words are created for the analysed languages. The cosine of the angle between these vectors is computed. Identical languages receive a value of 0, distinct languages get a value of 1. The same analysis is conducted using the frequencies of the top-1000 typical letter trigrams of the languages. Monolingual word list comparison (top-1000 words) As one can expect the comparisons show: The different news corpora have different word lists with maximum distance 0.23 (swe_newscrawl_2011 and swe_news_2011) The wikipedia corpora are similar with maximum distance 0.09 The web corpora have maximum distance 0.18 (swe_web_2002 and swe_web_2012) The mixed corpus hun_mixed_2012 holds a central position with maximum distances of 0.32 to the other corpora. Multilingual word list comparison (top-1000 words) Both the comparison of the top-1000 words and the comparison of the letter trigrams used in these words show that there are similar languages in our data, mainly members of the north germanic family. The distance of the mixed corpus to the next language, Slovak, is 0.47 for the words and 0.54 for the letter trigrams. Both distances are below average. The average value for the most similar language is 0.58 for trigrams. The most similar languages based on words: Danish, Norwegian (Bokmål), Norwegian (Nynorsk) source language_short_name language_name cos_logfreq swe dan Danish swe nob Norwegian, Bokmål swe nno Norwegian, Nynorsk swe loy Loke swe cat Catalan-Valencian-Balear The most similar languages based on letter trigrams: Danish, Norwegian (Bokmål), Norwegian (Nynorsk) source language_short_name language_name cos_logfreq swe dan Danish swe nob Norwegian, Bokmål swe nno Norwegian, Nynorsk

19 SWE corpus comparison 9 swe eng English swe nld Dutch

20 10 Processing details Appendix to swe news 2007: Database summary Values for some general parameters Parameter Value Number of sentences Number of running word forms Number of distinct word forms Number of multiwords 0 Percentage of words with frequency= Number of sentence based co-occurrences Number of neighbour co-occurrences Appendix to swe news 2008: Database summary Values for some general parameters Parameter Value Number of sentences Number of running word forms Number of distinct word forms Number of multiwords Percentage of words with frequency= Number of sentence based co-occurrences Number of neighbour co-occurrences

Isolda Purchase - EDI

Isolda Purchase - EDI Isolda Purchase - EDI Document v 1.0 1 Table of Contents Table of Contents... 2 1 Introduction... 3 1.1 What is EDI?... 4 1.2 Sending and receiving documents... 4 1.3 File format... 4 1.3.1 XML (language

Läs mer

Grafisk teknik IMCDP IMCDP IMCDP. IMCDP(filter) Sasan Gooran (HT 2006) Assumptions:

Grafisk teknik IMCDP IMCDP IMCDP. IMCDP(filter) Sasan Gooran (HT 2006) Assumptions: IMCDP Grafisk teknik The impact of the placed dot is fed back to the original image by a filter Original Image Binary Image Sasan Gooran (HT 2006) The next dot is placed where the modified image has its

Läs mer

Schenker Privpak AB Telefon VAT Nr. SE Schenker ABs ansvarsbestämmelser, identiska med Box 905 Faxnr Säte: Borås

Schenker Privpak AB Telefon VAT Nr. SE Schenker ABs ansvarsbestämmelser, identiska med Box 905 Faxnr Säte: Borås Schenker Privpak AB Interface documentation for web service packageservices.asmx 2012-09-01 Version: 1.0.0 Doc. no.: I04304b Sida 2 av 7 Revision history Datum Version Sign. Kommentar 2012-09-01 1.0.0

Läs mer

SVENSK STANDARD SS-EN ISO 19108:2005/AC:2015

SVENSK STANDARD SS-EN ISO 19108:2005/AC:2015 SVENSK STANDARD SS-EN ISO 19108:2005/AC:2015 Fastställd/Approved: 2015-07-23 Publicerad/Published: 2016-05-24 Utgåva/Edition: 1 Språk/Language: engelska/english ICS: 35.240.70 Geografisk information Modell

Läs mer

This exam consists of four problems. The maximum sum of points is 20. The marks 3, 4 and 5 require a minimum

This exam consists of four problems. The maximum sum of points is 20. The marks 3, 4 and 5 require a minimum Examiner Linus Carlsson 016-01-07 3 hours In English Exam (TEN) Probability theory and statistical inference MAA137 Aids: Collection of Formulas, Concepts and Tables Pocket calculator This exam consists

Läs mer

Information technology Open Document Format for Office Applications (OpenDocument) v1.0 (ISO/IEC 26300:2006, IDT) SWEDISH STANDARDS INSTITUTE

Information technology Open Document Format for Office Applications (OpenDocument) v1.0 (ISO/IEC 26300:2006, IDT) SWEDISH STANDARDS INSTITUTE SVENSK STANDARD SS-ISO/IEC 26300:2008 Fastställd/Approved: 2008-06-17 Publicerad/Published: 2008-08-04 Utgåva/Edition: 1 Språk/Language: engelska/english ICS: 35.240.30 Information technology Open Document

Läs mer

Grafisk teknik IMCDP. Sasan Gooran (HT 2006) Assumptions:

Grafisk teknik IMCDP. Sasan Gooran (HT 2006) Assumptions: Grafisk teknik Sasan Gooran (HT 2006) Iterative Method Controlling Dot Placement (IMCDP) Assumptions: The original continuous-tone image is scaled between 0 and 1 0 and 1 represent white and black respectively

Läs mer

Writing with context. Att skriva med sammanhang

Writing with context. Att skriva med sammanhang Writing with context Att skriva med sammanhang What makes a piece of writing easy and interesting to read? Discuss in pairs and write down one word (in English or Swedish) to express your opinion http://korta.nu/sust(answer

Läs mer

Schenker Privpak AB Telefon 033-178300 VAT Nr. SE556124398001 Schenker ABs ansvarsbestämmelser, identiska med Box 905 Faxnr 033-257475 Säte: Borås

Schenker Privpak AB Telefon 033-178300 VAT Nr. SE556124398001 Schenker ABs ansvarsbestämmelser, identiska med Box 905 Faxnr 033-257475 Säte: Borås Schenker Privpak AB Interface documentation for Parcel Search 2011-10-18 Version: 1 Doc. no.: I04306 Sida 2 av 5 Revision history Datum Version Sign. Kommentar 2011-10-18 1.0.0 PD First public version.

Läs mer

Grafisk teknik. Sasan Gooran (HT 2006)

Grafisk teknik. Sasan Gooran (HT 2006) Grafisk teknik Sasan Gooran (HT 2006) Iterative Method Controlling Dot Placement (IMCDP) Assumptions: The original continuous-tone image is scaled between 0 and 1 0 and 1 represent white and black respectively

Läs mer

Styrteknik: Binära tal, talsystem och koder D3:1

Styrteknik: Binära tal, talsystem och koder D3:1 Styrteknik: Binära tal, talsystem och koder D3:1 Digitala kursmoment D1 Boolesk algebra D2 Grundläggande logiska funktioner D3 Binära tal, talsystem och koder Styrteknik :Binära tal, talsystem och koder

Läs mer

Schenker Privpak AB Telefon 033-178300 VAT Nr. SE556124398001 Schenker ABs ansvarsbestämmelser, identiska med Box 905 Faxnr 033-257475 Säte: Borås

Schenker Privpak AB Telefon 033-178300 VAT Nr. SE556124398001 Schenker ABs ansvarsbestämmelser, identiska med Box 905 Faxnr 033-257475 Säte: Borås Schenker Privpak AB Interface documentation for web service packageservices.asmx 2010-10-21 Version: 1.2.2 Doc. no.: I04304 Sida 2 av 14 Revision history Datum Version Sign. Kommentar 2010-02-18 1.0.0

Läs mer

Isometries of the plane

Isometries of the plane Isometries of the plane Mikael Forsberg August 23, 2011 Abstract Här följer del av ett dokument om Tesselering som jag skrivit för en annan kurs. Denna del handlar om isometrier och innehåller bevis för

Läs mer

A study of the performance

A study of the performance A study of the performance and utilization of the Swedish railway network Anders Lindfeldt Royal Institute of Technology 2011-02-03 Introduction The load on the railway network increases steadily, and

Läs mer

Adding active and blended learning to an introductory mechanics course

Adding active and blended learning to an introductory mechanics course Adding active and blended learning to an introductory mechanics course Ulf Gran Chalmers, Physics Background Mechanics 1 for Engineering Physics and Engineering Mathematics (SP2/3, 7.5 hp) 200+ students

Läs mer

Stiftelsen Allmänna Barnhuset KARLSTADS UNIVERSITET

Stiftelsen Allmänna Barnhuset KARLSTADS UNIVERSITET Stiftelsen Allmänna Barnhuset KARLSTADS UNIVERSITET National Swedish parental studies using the same methodology have been performed in 1980, 2000, 2006 and 2011 (current study). In 1980 and 2000 the studies

Läs mer

1. Compute the following matrix: (2 p) 2. Compute the determinant of the following matrix: (2 p)

1. Compute the following matrix: (2 p) 2. Compute the determinant of the following matrix: (2 p) UMEÅ UNIVERSITY Department of Mathematics and Mathematical Statistics Pre-exam in mathematics Linear algebra 2012-02-07 1. Compute the following matrix: (2 p 3 1 2 3 2 2 7 ( 4 3 5 2 2. Compute the determinant

Läs mer

SVENSK STANDARD SS

SVENSK STANDARD SS Provläsningsexemplar / Preview SVENSK STANDARD Handläggande organ Fastställd Utgåva Sida Allmänna Standardiseringsgruppen, STG 1998-01-30 1 1 (13) SIS FASTSTÄLLER OCH UTGER SVENSK STANDARD SAMT SÄLJER

Läs mer

Webbregistrering pa kurs och termin

Webbregistrering pa kurs och termin Webbregistrering pa kurs och termin 1. Du loggar in på www.kth.se via den personliga menyn Under fliken Kurser och under fliken Program finns på höger sida en länk till Studieöversiktssidan. På den sidan

Läs mer

Preschool Kindergarten

Preschool Kindergarten Preschool Kindergarten Objectives CCSS Reading: Foundational Skills RF.K.1.D: Recognize and name all upper- and lowercase letters of the alphabet. RF.K.3.A: Demonstrate basic knowledge of one-toone letter-sound

Läs mer

Viktig information för transmittrar med option /A1 Gold-Plated Diaphragm

Viktig information för transmittrar med option /A1 Gold-Plated Diaphragm Viktig information för transmittrar med option /A1 Gold-Plated Diaphragm Guldplätering kan aldrig helt stoppa genomträngningen av vätgas, men den får processen att gå långsammare. En tjock guldplätering

Läs mer

6 th Grade English October 6-10, 2014

6 th Grade English October 6-10, 2014 6 th Grade English October 6-10, 2014 Understand the content and structure of a short story. Imagine an important event or challenge in the future. Plan, draft, revise and edit a short story. Writing Focus

Läs mer

Support for Artist Residencies

Support for Artist Residencies 1. Basic information 1.1. Name of the Artist-in-Residence centre 0/100 1.2. Name of the Residency Programme (if any) 0/100 1.3. Give a short description in English of the activities that the support is

Läs mer

Module 6: Integrals and applications

Module 6: Integrals and applications Department of Mathematics SF65 Calculus Year 5/6 Module 6: Integrals and applications Sections 6. and 6.5 and Chapter 7 in Calculus by Adams and Essex. Three lectures, two tutorials and one seminar. Important

Läs mer

CHANGE WITH THE BRAIN IN MIND. Frukostseminarium 11 oktober 2018

CHANGE WITH THE BRAIN IN MIND. Frukostseminarium 11 oktober 2018 CHANGE WITH THE BRAIN IN MIND Frukostseminarium 11 oktober 2018 EGNA FÖRÄNDRINGAR ü Fundera på ett par förändringar du drivit eller varit del av ü De som gått bra och det som gått dåligt. Vi pratar om

Läs mer

SAMMANFATTNING AV SUMMARY OF

SAMMANFATTNING AV SUMMARY OF Detta dokument är en enkel sammanfattning i syfte att ge en första orientering av investeringsvillkoren. Fullständiga villkor erhålles genom att registera sin e- postadress på ansökningssidan för FastForward

Läs mer

Bridging the gap - state-of-the-art testing research, Explanea, and why you should care

Bridging the gap - state-of-the-art testing research, Explanea, and why you should care Bridging the gap - state-of-the-art testing research, Explanea, and why you should care Robert Feldt Blekinge Institute of Technology & Chalmers All animations have been excluded in this pdf version! onsdag

Läs mer

Rastercell. Digital Rastrering. AM & FM Raster. Rastercell. AM & FM Raster. Sasan Gooran (VT 2007) Rastrering. Rastercell. Konventionellt, AM

Rastercell. Digital Rastrering. AM & FM Raster. Rastercell. AM & FM Raster. Sasan Gooran (VT 2007) Rastrering. Rastercell. Konventionellt, AM Rastercell Digital Rastrering Hybridraster, Rastervinkel, Rotation av digitala bilder, AM/FM rastrering Sasan Gooran (VT 2007) Önskat mått * 2* rastertätheten = inläsningsupplösning originalets mått 2

Läs mer

Accomodations at Anfasteröd Gårdsvik, Ljungskile

Accomodations at Anfasteröd Gårdsvik, Ljungskile Accomodations at Anfasteröd Gårdsvik, Ljungskile Anfasteröd Gårdsvik is a campsite and resort, located right by the sea and at the edge of the forest, south west of Ljungskile. We offer many sorts of accommodations

Läs mer

EXPERT SURVEY OF THE NEWS MEDIA

EXPERT SURVEY OF THE NEWS MEDIA EXPERT SURVEY OF THE NEWS MEDIA THE SHORENSTEIN CENTER ON THE PRESS, POLITICS & PUBLIC POLICY JOHN F. KENNEDY SCHOOL OF GOVERNMENT, HARVARD UNIVERSITY, CAMBRIDGE, MA 0238 PIPPA_NORRIS@HARVARD.EDU. FAX:

Läs mer

WindPRO version 2.7.448 feb 2010. SHADOW - Main Result. Calculation: inkl Halmstad SWT 2.3. Assumptions for shadow calculations. Shadow receptor-input

WindPRO version 2.7.448 feb 2010. SHADOW - Main Result. Calculation: inkl Halmstad SWT 2.3. Assumptions for shadow calculations. Shadow receptor-input SHADOW - Main Result Calculation: inkl Halmstad SWT 2.3 Assumptions for shadow calculations Maximum distance for influence Calculate only when more than 20 % of sun is covered by the blade Please look

Läs mer

Make a speech. How to make the perfect speech. söndag 6 oktober 13

Make a speech. How to make the perfect speech. söndag 6 oktober 13 Make a speech How to make the perfect speech FOPPA FOPPA Finding FOPPA Finding Organizing FOPPA Finding Organizing Phrasing FOPPA Finding Organizing Phrasing Preparing FOPPA Finding Organizing Phrasing

Läs mer

En bild säger mer än tusen ord?

En bild säger mer än tusen ord? Faculteit Letteren en Wijsbegeerte Academiejaar 2009-2010 En bild säger mer än tusen ord? En studie om dialogen mellan illustrationer och text i Tiina Nunnallys engelska översättning av Pippi Långstrump

Läs mer

1. Unpack content of zip-file to temporary folder and double click Setup

1. Unpack content of zip-file to temporary folder and double click Setup Instruktioner Dokumentnummer/Document Number Titel/Title Sida/Page 13626-1 BM800 Data Interface - Installation Instructions 1/8 Utfärdare/Originator Godkänd av/approved by Gäller från/effective date Mats

Läs mer

Metodprov för kontroll av svetsmutterförband Kontrollbestämmelse Method test for inspection of joints of weld nut Inspection specification

Metodprov för kontroll av svetsmutterförband Kontrollbestämmelse Method test for inspection of joints of weld nut Inspection specification Stämpel/Etikett Security stamp/lable Metodprov för kontroll av svetsmutterförband Kontrollbestämmelse Method test for inspection of joints of weld nut Inspection specification Granskad av Reviewed by Göran

Läs mer

Calculate check digits according to the modulus-11 method

Calculate check digits according to the modulus-11 method 2016-12-01 Beräkning av kontrollsiffra 11-modulen Calculate check digits according to the modulus-11 method Postadress: 105 19 Stockholm Besöksadress: Palmfeltsvägen 5 www.bankgirot.se Bankgironr: 160-9908

Läs mer

Boiler with heatpump / Värmepumpsberedare

Boiler with heatpump / Värmepumpsberedare Boiler with heatpump / Värmepumpsberedare QUICK START GUIDE / SNABBSTART GUIDE More information and instruction videos on our homepage www.indol.se Mer information och instruktionsvideos på vår hemsida

Läs mer

Protokoll Föreningsutskottet 2013-10-22

Protokoll Föreningsutskottet 2013-10-22 Protokoll Föreningsutskottet 2013-10-22 Närvarande: Oliver Stenbom, Andreas Estmark, Henrik Almén, Ellinor Ugland, Oliver Jonstoij Berg. 1. Mötets öppnande. Ordförande Oliver Stenbom öppnade mötet. 2.

Läs mer

http://marvel.com/games/play/31/create_your_own_superhero http://www.heromachine.com/

http://marvel.com/games/play/31/create_your_own_superhero http://www.heromachine.com/ Name: Year 9 w. 4-7 The leading comic book publisher, Marvel Comics, is starting a new comic, which it hopes will become as popular as its classics Spiderman, Superman and The Incredible Hulk. Your job

Läs mer

Aborter i Sverige 2008 januari juni

Aborter i Sverige 2008 januari juni HÄLSA OCH SJUKDOMAR 2008:9 Aborter i Sverige 2008 januari juni Preliminär sammanställning SVERIGES OFFICIELLA STATISTIK Statistik Hälsa och Sjukdomar Aborter i Sverige 2008 januari juni Preliminär sammanställning

Läs mer

Resultat av den utökade första planeringsövningen inför RRC september 2005

Resultat av den utökade första planeringsövningen inför RRC september 2005 Resultat av den utökade första planeringsövningen inför RRC-06 23 september 2005 Resultat av utökad första planeringsövning - Tillägg av ytterligare administrativa deklarationer - Variant (av case 4) med

Läs mer

SVENSK STANDARD SS-ISO 8779:2010/Amd 1:2014

SVENSK STANDARD SS-ISO 8779:2010/Amd 1:2014 SVENSK STANDARD SS-ISO 8779:2010/Amd 1:2014 Fastställd/Approved: 2014-07-04 Publicerad/Published: 2014-07-07 Utgåva/Edition: 1 Språk/Language: engelska/english ICS: 23.040.20; 65.060.35; 83.140.30 Plaströrssystem

Läs mer

PORTSECURITY IN SÖLVESBORG

PORTSECURITY IN SÖLVESBORG PORTSECURITY IN SÖLVESBORG Kontaktlista i skyddsfrågor / List of contacts in security matters Skyddschef/PFSO Tord Berg Phone: +46 456 422 44. Mobile: +46 705 82 32 11 Fax: +46 456 104 37. E-mail: tord.berg@sbgport.com

Läs mer

CUSTOMER READERSHIP HARRODS MAGAZINE CUSTOMER OVERVIEW. 63% of Harrods Magazine readers are mostly interested in reading about beauty

CUSTOMER READERSHIP HARRODS MAGAZINE CUSTOMER OVERVIEW. 63% of Harrods Magazine readers are mostly interested in reading about beauty 79% of the division trade is generated by Harrods Rewards customers 30% of our Beauty clients are millennials 42% of our trade comes from tax-free customers 73% of the department base is female Source:

Läs mer

Managing addresses in the City of Kokkola Underhåll av adresser i Karleby stad

Managing addresses in the City of Kokkola Underhåll av adresser i Karleby stad Managing addresses in the City of Kokkola Underhåll av adresser i Karleby stad Nordic Address Meeting Odense 3.-4. June 2010 Asko Pekkarinen Anna Kujala Facts about Kokkola Fakta om Karleby Population:

Läs mer

Eternal Employment Financial Feasibility Study

Eternal Employment Financial Feasibility Study Eternal Employment Financial Feasibility Study 2017-08-14 Assumptions Available amount: 6 MSEK Time until first payment: 7 years Current wage: 21 600 SEK/month (corresponding to labour costs of 350 500

Läs mer

12.6 Heat equation, Wave equation

12.6 Heat equation, Wave equation 12.6 Heat equation, 12.2-3 Wave equation Eugenia Malinnikova, NTNU September 26, 2017 1 Heat equation in higher dimensions The heat equation in higher dimensions (two or three) is u t ( = c 2 2 ) u x 2

Läs mer

Documentation SN 3102

Documentation SN 3102 This document has been created by AHDS History and is based on information supplied by the depositor /////////////////////////////////////////////////////////// THE EUROPEAN STATE FINANCE DATABASE (Director:

Läs mer

Questionnaire for visa applicants Appendix A

Questionnaire for visa applicants Appendix A Questionnaire for visa applicants Appendix A Business Conference visit 1 Personal particulars Surname Date of birth (yr, mth, day) Given names (in full) 2 Your stay in Sweden A. Who took the initiative

Läs mer

Surfaces for sports areas Determination of vertical deformation. Golvmaterial Sportbeläggningar Bestämning av vertikal deformation

Surfaces for sports areas Determination of vertical deformation. Golvmaterial Sportbeläggningar Bestämning av vertikal deformation SVENSK STANDARD SS-EN 14809:2005/AC:2007 Fastställd/Approved: 2007-11-05 Publicerad/Published: 2007-12-03 Utgåva/Edition: 1 Språk/Language: engelska/english ICS: 97.220.10 Golvmaterial Sportbeläggningar

Läs mer

SVENSK STANDARD SS

SVENSK STANDARD SS SVENSK STANDARD SS 03 54 14 Handläggande organ Fastställd Utgåva Sida Allmänna Standardiseringsgruppen, STG 1998-11-13 1 1 (3+21) INNEHÅLLET I SVENSK STANDARD ÄR UPPHOVSRÄTTSLIGT SKYDDAT. SIS HAR COPYRIGHT

Läs mer

Measuring void content with GPR Current test with PaveScan and a comparison with traditional GPR systems. Martin Wiström, Ramboll RST

Measuring void content with GPR Current test with PaveScan and a comparison with traditional GPR systems. Martin Wiström, Ramboll RST Measuring void content with GPR Current test with PaveScan and a comparison with traditional GPR systems Martin Wiström, Ramboll RST Hålrum med GPR SBUF-projekt pågår för att utvärdera möjligheterna att

Läs mer

EXTERNAL ASSESSMENT SAMPLE TASKS SWEDISH BREAKTHROUGH LSPSWEB/0Y09

EXTERNAL ASSESSMENT SAMPLE TASKS SWEDISH BREAKTHROUGH LSPSWEB/0Y09 EXTENAL ASSESSENT SAPLE TASKS SWEDISH BEAKTHOUGH LSPSWEB/0Y09 Asset Languages External Assessment Sample Tasks Breakthrough Stage Listening and eading Swedish Contents Page Introduction 2 Listening Sample

Läs mer

Annonsformat desktop. Startsida / områdesstartsidor. Artikel/nyhets-sidor. 1. Toppbanner, format 1050x180 pxl. Format 1060x180 px + 250x240 pxl.

Annonsformat desktop. Startsida / områdesstartsidor. Artikel/nyhets-sidor. 1. Toppbanner, format 1050x180 pxl. Format 1060x180 px + 250x240 pxl. Annonsformat desktop Startsida / områdesstartsidor 1. Toppbanner, format 1050x180 pxl. Bigbang (toppbanner + bannerplats 2) Format 1060x180 px + 250x240 pxl. 2. DW, format 250x240 pxl. 3. TW, format 250x360

Läs mer

Michael Q. Jones & Matt B. Pedersen University of Nevada Las Vegas

Michael Q. Jones & Matt B. Pedersen University of Nevada Las Vegas Michael Q. Jones & Matt B. Pedersen University of Nevada Las Vegas The Distributed Application Debugger is a debugging tool for parallel programs Targets the MPI platform Runs remotley even on private

Läs mer

NORDIC GRID DISTURBANCE STATISTICS 2012

NORDIC GRID DISTURBANCE STATISTICS 2012 NORDIC GRID DISTURBANCE STATISTICS 2012 Utdrag ur rapport utarbetad av DISTAC-gruppen under RGN inom ENTSO-E Sture Holmström 2 Korta bakgrundsfakta > 1999-2000 utarbetades Riktlinjer för klassificering

Läs mer

SVENSK STANDARD SS :2010

SVENSK STANDARD SS :2010 SVENSK STANDARD SS 8760009:2010 Fastställd/Approved: 2010-03-22 Publicerad/Published: 2010-04-27 Utgåva/Edition: 2 Språk/Language: svenska/swedish ICS: 11.140 Sjukvårdstextil Sortering av undertrikå vid

Läs mer

Materialplanering och styrning på grundnivå. 7,5 högskolepoäng

Materialplanering och styrning på grundnivå. 7,5 högskolepoäng Materialplanering och styrning på grundnivå Provmoment: Ladokkod: Tentamen ges för: Skriftlig tentamen TI6612 Af3-Ma, Al3, Log3,IBE3 7,5 högskolepoäng Namn: (Ifylles av student) Personnummer: (Ifylles

Läs mer

Measuring child participation in immunization registries: two national surveys, 2001

Measuring child participation in immunization registries: two national surveys, 2001 Measuring child participation in immunization registries: two national surveys, 2001 Diana Bartlett Immunization Registry Support Branch National Immunization Program Objectives Describe the progress of

Läs mer

Kurskod: TAMS28 MATEMATISK STATISTIK Provkod: TEN1 05 June 2017, 14:00-18:00. English Version

Kurskod: TAMS28 MATEMATISK STATISTIK Provkod: TEN1 05 June 2017, 14:00-18:00. English Version Kurskod: TAMS28 MATEMATISK STATISTIK Provkod: TEN1 5 June 217, 14:-18: Examiner: Zhenxia Liu (Tel: 7 89528). Please answer in ENGLISH if you can. a. You are allowed to use a calculator, the formula and

Läs mer

Dagens Nyheter STHLM Total. A Stockholm paper made by and for those that love Stockholm

Dagens Nyheter STHLM Total. A Stockholm paper made by and for those that love Stockholm Summary Dagens Nyheter STHLM Total. A Stockholm paper made by and for those that love Stockholm Our readers know the product as DagensNyheter STHLM. At the same time our advertisers know this specific

Läs mer

Lösenordsportalen Hosted by UNIT4 For instructions in English, see further down in this document

Lösenordsportalen Hosted by UNIT4 For instructions in English, see further down in this document Lösenordsportalen Hosted by UNIT4 For instructions in English, see further down in this document Användarhandledning inloggning Logga in Gå till denna webbsida för att logga in: http://csportal.u4a.se/

Läs mer

Uttagning för D21E och H21E

Uttagning för D21E och H21E Uttagning för D21E och H21E Anmälan till seniorelitklasserna vid O-Ringen i Kolmården 2019 är öppen fram till och med fredag 19 juli klockan 12.00. 80 deltagare per klass tas ut. En rangordningslista med

Läs mer

Datasäkerhet och integritet

Datasäkerhet och integritet Chapter 4 module A Networking Concepts OSI-modellen TCP/IP This module is a refresher on networking concepts, which are important in information security A Simple Home Network 2 Unshielded Twisted Pair

Läs mer

Webbreg öppen: 26/ /

Webbreg öppen: 26/ / Webbregistrering pa kurs, period 2 HT 2015. Webbreg öppen: 26/10 2015 5/11 2015 1. Du loggar in på www.kth.se via den personliga menyn Under fliken Kurser och under fliken Program finns på höger sida en

Läs mer

Module 1: Functions, Limits, Continuity

Module 1: Functions, Limits, Continuity Department of mathematics SF1625 Calculus 1 Year 2015/2016 Module 1: Functions, Limits, Continuity This module includes Chapter P and 1 from Calculus by Adams and Essex and is taught in three lectures,

Läs mer

Tentamen i Matematik 2: M0030M.

Tentamen i Matematik 2: M0030M. Tentamen i Matematik 2: M0030M. Datum: 203-0-5 Skrivtid: 09:00 4:00 Antal uppgifter: 2 ( 30 poäng ). Examinator: Norbert Euler Tel: 0920-492878 Tillåtna hjälpmedel: Inga Betygsgränser: 4p 9p = 3; 20p 24p

Läs mer

BOENDEFORMENS BETYDELSE FÖR ASYLSÖKANDES INTEGRATION Lina Sandström

BOENDEFORMENS BETYDELSE FÖR ASYLSÖKANDES INTEGRATION Lina Sandström BOENDEFORMENS BETYDELSE FÖR ASYLSÖKANDES INTEGRATION Lina Sandström Frågeställningar Kan asylprocessen förstås som en integrationsprocess? Hur fungerar i sådana fall denna process? Skiljer sig asylprocessen

Läs mer

LUNDS TEKNISKA HÖGSKOLA Institutionen för Elektro- och Informationsteknik

LUNDS TEKNISKA HÖGSKOLA Institutionen för Elektro- och Informationsteknik LUNDS TEKNISKA HÖGSKOLA Institutionen för Elektro- och Informationsteknik SIGNALBEHANDLING I MULTIMEDIA, EITA50, LP4, 209 Inlämningsuppgift av 2, Assignment out of 2 Inlämningstid: Lämnas in senast kl

Läs mer

Kurskod: TAIU06 MATEMATISK STATISTIK Provkod: TENA 15 August 2016, 8:00-12:00. English Version

Kurskod: TAIU06 MATEMATISK STATISTIK Provkod: TENA 15 August 2016, 8:00-12:00. English Version Kurskod: TAIU06 MATEMATISK STATISTIK Provkod: TENA 15 August 2016, 8:00-12:00 Examiner: Xiangfeng Yang (Tel: 070 0896661). Please answer in ENGLISH if you can. a. Allowed to use: a calculator, Formelsamling

Läs mer

Anders Persson Philosophy of Science (FOR001F) Response rate = 0 % Survey Results. Relative Frequencies of answers Std. Dev.

Anders Persson Philosophy of Science (FOR001F) Response rate = 0 % Survey Results. Relative Frequencies of answers Std. Dev. Anders Persson Philosophy of Science (FOR00F) Response rate = 0 % Survey Results Legend Relative Frequencies of answers Std. Dev. Mean Question text Left pole % % Right pole n=no. of responses av.=mean

Läs mer

Biblioteket.se. A library project, not a web project. Daniel Andersson. Biblioteket.se. New Communication Channels in Libraries Budapest Nov 19, 2007

Biblioteket.se. A library project, not a web project. Daniel Andersson. Biblioteket.se. New Communication Channels in Libraries Budapest Nov 19, 2007 A library project, not a web project New Communication Channels in Libraries Budapest Nov 19, 2007 Daniel Andersson, daniel@biblioteket.se 1 Daniel Andersson Project manager and CDO at, Stockholm Public

Läs mer

District Application for Partnership

District Application for Partnership ESC Region Texas Regional Collaboratives in Math and Science District Application for Partnership 2013-2014 Applying for (check all that apply) Math Science District Name: District Contacts Name E-mail

Läs mer

Beijer Electronics AB 2000, MA00336A, 2000-12

Beijer Electronics AB 2000, MA00336A, 2000-12 Demonstration driver English Svenska Beijer Electronics AB 2000, MA00336A, 2000-12 Beijer Electronics AB reserves the right to change information in this manual without prior notice. All examples in this

Läs mer

SVENSK STANDARD SS-EN ISO

SVENSK STANDARD SS-EN ISO SVENSK STANDARD SS-EN ISO 2566-2 Handläggande organ Fastställd Utgåva Sida SVENSK MATERIAL- & MEKANSTANDARD, SMS 1999-06-30 1 1 (1+30) Copyright SIS. Reproduction in any form without permission is prohibited.

Läs mer

Thesis work at McNeil AB Evaluation/remediation of psychosocial risks and hazards.

Thesis work at McNeil AB Evaluation/remediation of psychosocial risks and hazards. Evaluation/remediation of psychosocial risks and hazards. Help us to create the path forward for managing psychosocial risks in the work environment by looking into different tools/support/thesis and benchmarking

Läs mer

Förändrade förväntningar

Förändrade förväntningar Förändrade förväntningar Deloitte Ca 200 000 medarbetare 150 länder 700 kontor Omsättning cirka 31,3 Mdr USD Spetskompetens av världsklass och djup lokal expertis för att hjälpa klienter med de insikter

Läs mer

Övning 5 ETS052 Datorkommuniktion Routing och Networking

Övning 5 ETS052 Datorkommuniktion Routing och Networking Övning 5 TS5 Datorkommuniktion - 4 Routing och Networking October 7, 4 Uppgift. Rita hur ett paket som skickas ut i nätet nedan från nod, med flooding, sprider sig genom nätet om hop count = 3. Solution.

Läs mer

Immigration Studying. Studying - University. Stating that you want to enroll. Stating that you want to apply for a course.

Immigration Studying. Studying - University. Stating that you want to enroll. Stating that you want to apply for a course. - University I would like to enroll at a university. Stating that you want to enroll I want to apply for course. Stating that you want to apply for a course an undergraduate a postgraduate a PhD a full-time

Läs mer

Evaluation Ny Nordisk Mat II Appendix 1. Questionnaire evaluation Ny Nordisk Mat II

Evaluation Ny Nordisk Mat II Appendix 1. Questionnaire evaluation Ny Nordisk Mat II Evaluation Ny Nordisk Mat II Appendix 1. Questionnaire evaluation Ny Nordisk Mat II English version A. About the Program in General We will now ask some questions about your relationship to the program

Läs mer

Kurskod: TAIU06 MATEMATISK STATISTIK Provkod: TENA 17 August 2015, 8:00-12:00. English Version

Kurskod: TAIU06 MATEMATISK STATISTIK Provkod: TENA 17 August 2015, 8:00-12:00. English Version Kurskod: TAIU06 MATEMATISK STATISTIK Provkod: TENA 17 August 2015, 8:00-12:00 Examiner: Xiangfeng Yang (Tel: 070 2234765). Please answer in ENGLISH if you can. a. Allowed to use: a calculator, Formelsamling

Läs mer

SWESIAQ Swedish Chapter of International Society of Indoor Air Quality and Climate

SWESIAQ Swedish Chapter of International Society of Indoor Air Quality and Climate Swedish Chapter of International Society of Indoor Air Quality and Climate Aneta Wierzbicka Swedish Chapter of International Society of Indoor Air Quality and Climate Independent and non-profit Swedish

Läs mer

FORSKNINGSKOMMUNIKATION OCH PUBLICERINGS- MÖNSTER INOM UTBILDNINGSVETENSKAP

FORSKNINGSKOMMUNIKATION OCH PUBLICERINGS- MÖNSTER INOM UTBILDNINGSVETENSKAP FORSKNINGSKOMMUNIKATION OCH PUBLICERINGS- MÖNSTER INOM UTBILDNINGSVETENSKAP En studie av svensk utbildningsvetenskaplig forskning vid tre lärosäten VETENSKAPSRÅDETS RAPPORTSERIE 10:2010 Forskningskommunikation

Läs mer

F ξ (x) = f(y, x)dydx = 1. We say that a random variable ξ has a distribution F (x), if. F (x) =

F ξ (x) = f(y, x)dydx = 1. We say that a random variable ξ has a distribution F (x), if. F (x) = Problems for the Basic Course in Probability (Fall 00) Discrete Probability. Die A has 4 red and white faces, whereas die B has red and 4 white faces. A fair coin is flipped once. If it lands on heads,

Läs mer

Application Note SW

Application Note SW TWINSAFE DIAGNOSTIK TwinSAFE är Beckhoffs safety-lösning. En översikt över hur TwinSAFE är implementerat, såväl fysiskt som logiskt, finns på hemsidan: http://www.beckhoff.se/english/highlights/fsoe/default.htm?id=35572043381

Läs mer

Integritetspolicy på svenska Integrity policy in English... 5

Integritetspolicy på svenska Integrity policy in English... 5 Innehållsförteckning / Table of content Integritetspolicy på svenska... 2 In Vino Veritas... 2 Vilka vi är... 2 Vilka personuppgifter vi samlar in och varför vi samlar in dem... 2 Namninsamlingen... 2

Läs mer

SVENSK STANDARD SS-EN ISO 9876

SVENSK STANDARD SS-EN ISO 9876 SVENSK STANDARD SS-EN ISO 9876 Handläggande organ Fastställd Utgåva Sida SVENSK MATERIAL- & MEKANSTANDARD, SMS 1999-07-30 2 1 (1+9) Copyright SIS. Reproduction in any form without permission is prohibited.

Läs mer

RADIATION TEST REPORT. GAMMA: 30.45k, 59.05k, 118.8k/TM1019 Condition D

RADIATION TEST REPORT. GAMMA: 30.45k, 59.05k, 118.8k/TM1019 Condition D RADIATION TEST REPORT PRODUCT: OP47AYQMLL Die Type: 147X FILE: OP47_LDR.xlsx DATE CODE: 95 GAMMA: 3.45k, 59.5k, 118.8k/TM119 Condition D GAMMA SOURCE: Co6 DOSE RATE: 8.6mRad(si)/s FACILITIES: University

Läs mer

SVENSK STANDARD SS-EN ISO

SVENSK STANDARD SS-EN ISO SVENSK STANDARD SS-EN ISO 8130-9 Handläggande organ Fastställd Utgåva Sida Standardiseringsgruppen STG 1999-12-10 1 1 (1+6) Copyright SIS. Reproduction in any form without permission is prohibited. Coating

Läs mer

STANDARD. UTM Ingegerd Annergren UTMS Lina Orbéus. UTMD Anders Johansson UTMS Jan Sandberg

STANDARD. UTM Ingegerd Annergren UTMS Lina Orbéus. UTMD Anders Johansson UTMS Jan Sandberg 1(7) Distribution: Scania, Supplier Presskruvar med rundat huvud - Metrisk gänga med grov delning Innehåll Sida Orientering... 1 Ändringar från föregående utgåva... 1 1 Material och hållfasthet... 1 2

Läs mer

Swedish adaptation of ISO TC 211 Quality principles. Erik Stenborg

Swedish adaptation of ISO TC 211 Quality principles. Erik Stenborg Swedish adaptation of ISO TC 211 Quality principles The subject How to use international standards Linguistic differences Cultural differences Historical differences Conditions ISO 19100 series will become

Läs mer

Statistical Quality Control Statistisk kvalitetsstyrning. 7,5 högskolepoäng. Ladok code: 41T05A, Name: Personal number:

Statistical Quality Control Statistisk kvalitetsstyrning. 7,5 högskolepoäng. Ladok code: 41T05A, Name: Personal number: Statistical Quality Control Statistisk kvalitetsstyrning 7,5 högskolepoäng Ladok code: 41T05A, The exam is given to: 41I02B IBE11, Pu2, Af2-ma Name: Personal number: Date of exam: 1 June Time: 9-13 Hjälpmedel

Läs mer

Kvalitetsarbete I Landstinget i Kalmar län. 24 oktober 2007 Eva Arvidsson

Kvalitetsarbete I Landstinget i Kalmar län. 24 oktober 2007 Eva Arvidsson Kvalitetsarbete I Landstinget i Kalmar län 24 oktober 2007 Eva Arvidsson Bakgrund Sammanhållen primärvård 2005 Nytt ekonomiskt system Olika tradition och förutsättningar Olika pågående projekt Get the

Läs mer

FÖRBERED UNDERLAG FÖR BEDÖMNING SÅ HÄR

FÖRBERED UNDERLAG FÖR BEDÖMNING SÅ HÄR FÖRBERED UNDERLAG FÖR BEDÖMNING SÅ HÄR Kontrollera vilka kurser du vill söka under utbytet. Fyll i Basis for nomination for exchange studies i samråd med din lärare. För att läraren ska kunna göra en korrekt

Läs mer

SweLL & legal aspects. Elena Volodina

SweLL & legal aspects. Elena Volodina SweLL & legal aspects Elena Volodina WG5 meeting, Bolzano, September, 7, 2017 SweLL Research infrastructure for Swedish as a Second Language Elena Volodina Lena Granstedt, Julia Prentice, Monica Reichenberg,

Läs mer

Högskolan i Skövde (SK, JS) Svensk version Tentamen i matematik

Högskolan i Skövde (SK, JS) Svensk version Tentamen i matematik Högskolan i Skövde (SK, JS) Svensk version Tentamen i matematik Kurs: MA152G Matematisk Analys MA123G Matematisk analys för ingenjörer Tentamensdag: 2012-03-24 kl 14.30-19.30 Hjälpmedel : Inga hjälpmedel

Läs mer