Collection: search-data

Files

Checking status…
Filename Size Modified Download
asphyxia-media 253 mb 2022-05-14 1:30:21 PM
asphyxia.json 1.4 mb 2022-05-21 1:32:44 PM
auslan-anywhere-media 0 b Today 4:00:03 AM
auslan-anywhere.json 2 b Today 4:00:03 AM
auslan-signbank.json 6 mb Today 1:40:45 AM
community-media 38 mb 2022-05-14 1:49:38 PM
community.json 44 kb 2023-02-05 12:30:06 AM
latrobe-ig-media 658 mb 2023-02-04 11:10:16 PM
latrobe-ig.json 862 kb 2023-02-05 12:30:14 AM
organisations.json 2 b 2023-03-10 9:26:30 AM
signpedia-media 316 mb 2022-05-14 2:01:01 PM
signpedia.json 63 kb 2023-02-05 12:30:06 AM
spread-the-sign-auslan.json 6.3 mb 2023-12-11 1:40:14 AM
theatre-101-media 598 mb 2022-05-14 2:52:07 PM
theatre-101.json 58 kb 2022-05-23 4:51:43 PM
toddslan-media 99 mb 2022-05-14 1:42:11 PM
toddslan.json 30 kb 2023-02-05 12:30:06 AM
v-alford-media 59 mb 2023-02-04 8:37:20 AM
v-alford.json 51 kb 2023-02-05 12:30:06 AM

Search Data

This is where you’ll find Find Sign’s search results dataset. Each json file represents one source of high quality Auslan vocab or phrase usage examples.

search-data format:

A search-data json file must be a JSON object in which each key/property name is a string which uniquely represents that sign and will not change when you add or remove data. It should ideally be fairly short and work well in a URL, like “forever-1”. The value of these properties must be an object, which may contain any of these properties:

{ "title": "Any plain text title that should show as a clickable heading in the search result", "words": ["list", "of", "words"], "body": "a string description of your video, which may include\na few different\nlines", "link": "https://link-to-root.tld/some/section/specific/entry.html", "nav": [ ["Site Name", "https://link-to-root.tld/"], ["Subsection name", "https://link-to-root.tld/some/section"], ["Specific Page", "https://link-to-root.tld/some/section/specific/entry.html"] ], "tags": ["list", "of", "hashtags"], "media": [ { "method": "fetch", "url": "https://media.your-site.tld/path/to/whatever.mp4", "version": "version-1" } ], "author": { "id": "persons-username", "name": "Person's Name", "link": "https://your-site.tld/user/profile", "avatar": "https://your-site.tld/user/profile/avatar.jpeg" }, "provider": { "id": "identifier-for-provider", "name": "Friendly Name of Provider", "verb": "documented", "link": "https://your-site.tld/" } }

You can, if you choose, omit title, in which case it will default to the unique key of this entry in the search data file. words are always the searchable thing. If words are not provided, Find Sign will make a best effort attempt to extract them automatically from title. If both title and words are omitted, the root object property name is assumed to be the title. words list will be transformed in to word vectors and compared against user’s search terms to rank results, so it’s content should usually be quite similar to the title string if both are provided. Words should not include punctuation like . or , or quote marks.

Newlines are rendered correctly in body text, and in the future basic markdown maybe supported there, but html never will.

link specifies where the user is sent when they click the title or videos. If omitted, find-sign will link to it’s own permalink page about the entry.

nav if provided, will cause a breadcrumbs-like link under the title to be rendered, which looks nicer for users. It’s a great place to put some branding, and should reflect the structure of your site. If it’s missing, link will be used, and if that’s missing, a Find Sign permalink will be used by default.

tags is optional, and can be provided as a list of hashtags. Each string must not contain whitespace. It should only contain characters that are easy to type like alphanumeric latin characters and may also use ., and - seperators. No other punctuation may be present and the strings must not begin with a #. If you know which states a sign is used in, please include hashtags like nsw and vic in the list to indicate which states your sign is used in.

media must be present, and must be an array of Media Elements (see below).

author is optional, if provided it should include an &id& field, which can be used with searching for @username or -@username queries. The other fields are currently not used by find sign and exist for future proofing against ideas for new ways to display search results. If avatar is provided, it should be a square image in a web compatible format like jpeg, png, gif, or webp. Username will be shown along side hashtags in search results if available.

provider is optional. If provided, it should specify a unique id for your site/corpus, a link to a webpage about your site/corpus, a verb which is used in the Find Sign homepage when describing new entries added recently, and a friendly name for your website, also used on the homepage and feeds. The provider value should normally be the same for every entry in your search data file, though it might make sense to vary the verb (i.e. &made up& or &recorded& or &shared&)

Media Elements

Media elements are an object, with method set to &fetch& and url set to either a relative or absolute https/http url that directly points to a video file (like .mp4, mkv, etc). If a relative path is provided it will be understood as relative to where you host your search-data.json, like how &a& links in html work.

You should also add a version property to this object, with any string value which will change when you modify your media (like a hash of the file’s contents, or a file last modified date). If you don’t provide a version find-sign will not normally redownload your videos so changes wont be detected for a long time.

You may also provide a clipping property, which is an object containing optional start and end properties. If start is specified, it must be a number, which is how many seconds find-sign should skip in to the start of the video before transcoding, clipping off that many seconds from the beginning. If end is specified, it is how many seconds deep in to the video the clip should include. if both are provided, end must be a larger number than start. Start defaults to 0 and end defaults to the intrinsic duration of the video file.

If you have a video that contains several signs, you can reference the same video url in multiple search-data entries, and use clipping to control which section of the video is presented to the user. You could also use it to skip past intro logos or exclude end credits from the loop seen on find-sign.

Minimal example of search data

{ "Welcome to latrobe university": { "media": [{ "method": "fetch", "url": "./welcome-to-ltu.ogg" }] }, "How are you?": { "media": [{ "method": "fetch", "url": "how-are-you.mp4" }] } }

It is possible to publish an extremely minimal search data file like above. Title will be implied from object keys, and words extracted from title. The only mandatory field really is media. Downside’s to this approach:

  • not providing version on media means changes to the video file wont be noticed for a long time
  • any small changes to the object key to edit title will break permalinks and may cause media to be downloaded and encoded again in some cases
  • no control over presentation of discovery feed on homepage or listing of states in hashtags for usage regions limits utility of information

If you can provide more info, please do.

Copyrights

Find Sign is a hobby project run at a loss with no profit motive and no actual profit or gain for the people running it. As such, Australian copyright law isn’t generally an issue for find sign’s operations. If you use data from Find Sign, you should know that Find Sign does not provide you any license to do anything with it, and if your activities are commercial in nature you may be impacted by copyright law. If you’re just doing language research or making free tools that don’t make money, you should be fine, I think.

More importantly, please don’t use the data to do anything that upsets people featured in the index. If you can, reach out and ask them if they’re ok with what you want to do with it. Check the original websites for copyright information.