User Tools

Site Tools


Sidebar

Navigator

data-files

Data files

Mostly, we shall be using spreadsheets (Excel-type) to process our materials and produce data files. When the information is rich and complex, however, it may be better to organize the data into a database such as Filemaker or Access.

Spreasheets

Naming:

French version of this section

File names: experience has proved the need to codify file names so that one can easily retrieve them and know the content without having to open them. The same rule applies to Excels files, maps, photos, etc.. because these are the ones that tend to become numerous. Proposed scheme for Excel files:

NamePlace_Content-Sppt_Year

This formula is used to describe the territory to which the data relate (French Concession,etc.), the nature of the data, and the year (or years). To designate the various territories over time in Shanghai, we shall adopt a simple code:

  • IS = International Settlement
  • FC = French Concession
  • CA = Chinese administration (1843-1926)
  • SZF = Shizhengfu = Chinese municipality (1927-1945)
  • CM = Chinese municipality (1945-1954)

The rule as to the choice of the period is the date on which the document was produced and the institution that created it (e.g. SZF may have produced statistical series which cover the period 1918-1925, but it is a SZF source).

Specifically, it will create names files like this: SZF_Population_1932; CM_Foreign_1945-48; IS_Pop-Housing_1920, etc.

Content of cells:

When preparing a spreadsheet (Excel file), the following guidelines shall apply:

Rule no. 1:

Only one kind of information should appear in a cell. Never mix up two or more kinds on information in the same cell.

This is an example of how information about incidents involving dancing girls found in the Shenbao needs to be split up to be processed in Excel and hence GIS:

Rule no. 2:

Field names shall never have blank spaces. If you use two words, use the underscore sign _ to link them: e.g. Title_1, Name_Pin. You can use full names in the name field for your own information, but for Ascess and GIS processing, field names shall not exceed eight characters. You can have two different lines to handle field names: full name : Name_Pinyin –> 8-character name: Name_Pin

Rule no. 3:

Never mix up text and figures in the same cell. You may retain part of the information as a record for yourself (ex. street numbers in Shanghai may appear as 346B, but in the final file for GIS the “B” must go after you have ascertained its meaning and the actual location).

Rule no. 4:

Never use Chinese for writing dates (such as 1945年4月30日). No computer and no GIS software can process this. If you need to retain the original date in Chinese, record it in a separate column and write the dates in regular computer format (1945/4/30 in US or Chinese system; 30/4/1945 in French system).

Rule no. 5

Keep the labels of columns as short and concise as possible. Avoid reproducing the original. What matters is to characterize the information in the column. If you plan to transform Excel files into a database under Microsoft Access, you should be aware that field names cannot have more than 8 letters.

Rule no. 6:

Street address: this can be a very complex matter in Shanghai. The basic rule is to split all the different elements of information. For an address like 235 Route Ratard, your file should include three columns: one for Street Number, one for Street Type and one for Street Name. If the name of the street is in Chinese: 南京西路345号, you will need to add one column for Street Pinyin, Street suffix (for “xilu”).

Rule no. 7:

Invalid characters: beware that the use of certain characters will produce errors in an Access database and in GIS. You should avoid using any of the in any table field. There is no complete list, but the main invalid characters are:

  Accent grave (`)
  Exclamation mark (!)
  Period (.)
  bracket([])
  Leading space
  Non-printable characters
  Greater than sign (>)
  Less than sign (<)
  Period (.)
  Asterisk (*)
  Colon (:)
  Caret (^)
  Plus sign (+)
  Backslash (\)
  Equal sign (=)
  Ampersand (&)
  Slash mark (/)

For further directions: http://support.microsoft.com/kb/826763

Preparing Excel files

- First get the list of standardized street names. The names of streets, both in English/French (for the foreign settlements) and pinyin have been standardized in the GIS database. Make sure you follow the spelling in the database as it will save much time in processing your data. This will also avoid issues of misspelling or personal adaptation.You can download a list of standardized street names below.

List of standardized street names

If in doubt, get in touch with the IAO GIS specialist (Isabelle.Durand@ens-lyon.fr) to obtain the list of street names as registered in our database.

- For Western names, always create two columns: one for the proper name of streets (e.g. Montigny, Edward VII, etc.), and one for the suffix (rue, street, etc.).

Exceptions: in the case of Western names of small streets or alleys used as street designations (not as a lilong designation), e.g. Lau Dong Ka Loong, just reproduce the name in its original form in the “proper name” column.

- street names shall always follow the original spelling, including for street with Chinese names in the Settlements and Chinese municipality; always use the transliteration used at the time (e.g. Chu Pao San, Whampoo, etc.). Always refer to the IAO database list.

- Foreign settlements: use English or French names up to August 1945. This is an arbitrary demarcation line for practical purposes, even if the foreign settlements were abolished in July 1943. Use Chinese names in pinyin transliteration from August 1945 onward.

Please make sure you respect the standard rules for pinyin transliteration. Do not invent your own system ! If you are unsure about these rules, please refer to the Wikipedia pinyin page (http://en.wikipedia.org/wiki/Pinyin#Orthography)

- For Chinese names, always create two columns: one for the proper name of streets (e.g. Beijing, Sichuan, etc.), and one for the suffix (lu, zhilu, etc.). For all suffixes such as “nanlu”, donglu”, etc., always write them as one word, except when there are sections (Zhongshan dong yi lu)

* capitals only on proper name (Nanjing, Fuzhou, etc.) ; no capitals on « lu », « dong », etc. * for names that have more than two characters, always merge them into one word (Dashalao, Qingyuanhuan, etc.) * In case of prefix such as North Shansi Road, in the Chinese transliteration, do not separate the prefix from the proper name: beishanxi lu

Rules for lilong and local place names:

- in Chinese: write them as a single unit: 药水弄, 康家桥, 牛桥浜 - in pinyin : write them as a single unit Yaoshuilong, Kangjiaqiao, Niuqiaobang, with the first letter in capital.

Street number

- in Excel file, list the street numbers in a single column and reproduce the original number from the source (e.g. 12, 5A, etc.) - in Excel file, when the street number is not indicated, but replaced by the name of a lilong and a number in the lilong, create two separate columns, one for the name of the lilong, one for the number in the lilong. Do not mix street numbers and lilong names in the same column!

You will find below a simple template which you can adapt to your needs, depending on the nature of the processed document and its source.

address_template-.xlsx

Database

To be developed

data-files.txt · Last modified: 2013/06/04 21:46 by chenriot