Collection: search-data
Files
Filename | Size | Modified | Download |
asphyxia-media | 253 mb | 2022-05-14 1:30:21 PM | |
asphyxia.json | 1.4 mb | 2022-05-21 1:32:44 PM | |
auslan-anywhere-media | 0 b | 2024-07-16 4:00:03 AM | |
auslan-anywhere.json | 2 b | Yesterday 4:00:11 AM | |
auslan-signbank.json | 6 mb | 2024-07-15 1:06:56 AM | |
community-media | 38 mb | 2022-05-14 1:49:38 PM | |
community.json | 44 kb | 2023-02-05 12:30:06 AM | |
latrobe-ig-media | 658 mb | 2023-02-04 11:10:16 PM | |
latrobe-ig.json | 862 kb | 2023-02-05 12:30:14 AM | |
organisations.json | 2 b | 2023-03-10 9:26:30 AM | |
signpedia-media | 316 mb | 2022-05-14 2:01:01 PM | |
signpedia.json | 63 kb | 2023-02-05 12:30:06 AM | |
spread-the-sign-auslan.json | 6.3 mb | 2023-12-11 1:40:14 AM | |
theatre-101-media | 598 mb | 2022-05-14 2:52:07 PM | |
theatre-101.json | 58 kb | 2022-05-23 4:51:43 PM | |
toddslan-media | 99 mb | 2022-05-14 1:42:11 PM | |
toddslan.json | 30 kb | 2023-02-05 12:30:06 AM | |
v-alford-media | 59 mb | 2023-02-04 8:37:20 AM | |
v-alford.json | 51 kb | 2023-02-05 12:30:06 AM |
Search Data
This is where you’ll find Find Sign’s search results dataset. Each json file represents one source of high quality Auslan vocab or phrase usage examples.
search-data
format:
A search-data json file must be a JSON object in which each key/property name is a string which uniquely represents that sign and will not change when you add or remove data. It should ideally be fairly short and work well in a URL, like “forever-1”. The value of these properties must be an object, which may contain any of these properties:
You can, if you choose, omit title
, in which case it will default to the unique key of this entry in the search data file. words
are always the searchable thing. If words
are not provided, Find Sign will make a best effort attempt to extract them automatically from title
. If both title
and words
are omitted, the root object property name is assumed to be the title. words list will be transformed in to word vectors and compared against user’s search terms to rank results, so it’s content should usually be quite similar to the title string if both are provided. Words should not include punctuation like .
or ,
or quote marks.
Newlines are rendered correctly in body text, and in the future basic markdown maybe supported there, but html never will.
link
specifies where the user is sent when they click the title or videos. If omitted, find-sign will link to it’s own permalink page about the entry.
nav
if provided, will cause a breadcrumbs-like link under the title to be rendered, which looks nicer for users. It’s a great place to put some branding, and should reflect the structure of your site. If it’s missing, link
will be used, and if that’s missing, a Find Sign permalink will be used by default.
tags
is optional, and can be provided as a list of hashtags. Each string must not contain whitespace. It should only contain characters that are easy to type like alphanumeric latin characters and may also use .
, and -
seperators. No other punctuation may be present and the strings must not begin with a #
. If you know which states a sign is used in, please include hashtags like nsw
and vic
in the list to indicate which states your sign is used in.
media
must be present, and must be an array of Media Elements (see below).
author
is optional, if provided it should include an &id&
field, which can be used with searching for @username
or -@username
queries. The other fields are currently not used by find sign and exist for future proofing against ideas for new ways to display search results. If avatar is provided, it should be a square image in a web compatible format like jpeg, png, gif, or webp. Username will be shown along side hashtags in search results if available.
provider
is optional. If provided, it should specify a unique id
for your site/corpus, a link
to a webpage about your site/corpus, a verb
which is used in the Find Sign homepage when describing new entries added recently, and a friendly name
for your website, also used on the homepage and feeds. The provider value should normally be the same for every entry in your search data file, though it might make sense to vary the verb (i.e. &made up&
or &recorded&
or &shared&
)
Media Elements
Media elements are an object, with method
set to &fetch&
and url
set to either a relative or absolute https/http url that directly points to a video file (like .mp4, mkv, etc). If a relative path is provided it will be understood as relative to where you host your search-data.json, like how &a&
links in html work.
You should also add a version
property to this object, with any string value which will change when you modify your media (like a hash of the file’s contents, or a file last modified date). If you don’t provide a version
find-sign will not normally redownload your videos so changes wont be detected for a long time.
You may also provide a clipping
property, which is an object containing optional start
and end
properties. If start is specified, it must be a number, which is how many seconds find-sign should skip in to the start of the video before transcoding, clipping off that many seconds from the beginning. If end is specified, it is how many seconds deep in to the video the clip should include. if both are provided, end must be a larger number than start. Start defaults to 0 and end defaults to the intrinsic duration of the video file.
If you have a video that contains several signs, you can reference the same video url in multiple search-data entries, and use clipping to control which section of the video is presented to the user. You could also use it to skip past intro logos or exclude end credits from the loop seen on find-sign.
Minimal example of search data
It is possible to publish an extremely minimal search data file like above. Title will be implied from object keys, and words extracted from title. The only mandatory field really is media. Downside’s to this approach:
- not providing version on media means changes to the video file wont be noticed for a long time
- any small changes to the object key to edit title will break permalinks and may cause media to be downloaded and encoded again in some cases
- no control over presentation of discovery feed on homepage or listing of states in hashtags for usage regions limits utility of information
If you can provide more info, please do.
Copyrights
Find Sign is a hobby project run at a loss with no profit motive and no actual profit or gain for the people running it. As such, Australian copyright law isn’t generally an issue for find sign’s operations. If you use data from Find Sign, you should know that Find Sign does not provide you any license to do anything with it, and if your activities are commercial in nature you may be impacted by copyright law. If you’re just doing language research or making free tools that don’t make money, you should be fine, I think.
More importantly, please don’t use the data to do anything that upsets people featured in the index. If you can, reach out and ask them if they’re ok with what you want to do with it. Check the original websites for copyright information.